Divide Last By First Row Of A Group
I have a dataframe that looks like this: ID Quantity Time 54 100 2020-01-01 00:00:05 54 87 2020-01-01 00:00:06 58 400 2020-01-01 00:00:08 58 390 2020-01-0
Solution 1:
If there is always pairs ID
you can select all pair rows by indexing [1::2]
and divide by shifted values by Series.shift
:
df['Result'] = df['Quantity'].iloc[1::2].div(df['Quantity'].shift())
print (df)
ID Quantity Time Result
0 54 100 2020-01-01 00:00:05 NaN
1 54 87 2020-01-01 00:00:06 0.870
2 58 400 2020-01-01 00:00:08 NaN
3 58 390 2020-01-01 00:00:14 0.975
Solution working with groups:
Another idea is use GroupBy.transform
with GroupBy.first
and divide by last value per ID
by DataFrame.drop_duplicates
:
first= df.groupby('ID')['Quantity'].transform('first')
df['Result'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
print (df)
ID Quantity TimeResult0541002020-01-0100:00:05 NaN
154872020-01-0100:00:060.8702584002020-01-0100:00:08 NaN
3583902020-01-0100:00:140.975
Or use lambda function with replace by NaN
by Series.mask
:
df['Result'] = (df.groupby('ID')['Quantity'].transform(lambda x: x.iat[-1] / x.iat[0])
.mask(df['ID'].duplicated(keep='last')))
print (df)
ID Quantity Time Result
0 54 100 2020-01-01 00:00:05 NaN
1 54 87 2020-01-01 00:00:06 0.870
2 58 400 2020-01-01 00:00:08 NaN
3 58 390 2020-01-01 00:00:14 0.975
Whats happen if a least one group with unique ID
?
print (df)
ID Quantity Time
0 54 100 2020-01-01 00:00:05
1 58 400 2020-01-01 00:00:08
2 58 390 2020-01-01 00:00:14
#faileddf['Result1'] = df['Quantity'].iloc[1::2].div(df['Quantity'].shift())
#working correct
first = df.groupby('ID')['Quantity'].transform('first')
df['Result2'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
#working correctdf['Result3'] = (df.groupby('ID')['Quantity'].transform(lambda x: x.iat[-1] / x.iat[0])
.mask(df['ID'].duplicated(keep='last')))
print (df)
ID Quantity Time Result1 Result2 Result3
0 54 100 2020-01-01 00:00:05 NaN 1.000 1.000
1 58 400 2020-01-01 00:00:08 4.0 NaN NaN
2 58 390 2020-01-01 00:00:14 NaN 0.975 0.975
Solution 2:
You could chunk into a couple of steps :
Create a grouping :
grouping = df.groupby("ID", sort=False).Quantity
Divide the last by the first per ID
:
first_div_last = grouping.last().div(grouping.first()).array
Get a boolean for the last row for each ID
:
last_rows = df.Quantity.eq(grouping.transform("last"))
Assign the result to a new column in df :
df.loc[last_rows,"Result"]=first_div_lastdfIDQuantityTimeResult0541002020-01-01 00:00:05 NaN154872020-01-01 00:00:06 0.8702584002020-01-01 00:00:08 NaN3583902020-01-01 00:00:14 0.975
Post a Comment for "Divide Last By First Row Of A Group"