Skip to content Skip to sidebar Skip to footer

Divide Last By First Row Of A Group

I have a dataframe that looks like this: ID Quantity Time 54 100 2020-01-01 00:00:05 54 87 2020-01-01 00:00:06 58 400 2020-01-01 00:00:08 58 390 2020-01-0

Solution 1:

If there is always pairs ID you can select all pair rows by indexing [1::2] and divide by shifted values by Series.shift:

df['Result'] = df['Quantity'].iloc[1::2].div(df['Quantity'].shift())
print (df)
   ID  Quantity                 Time  Result
0  54       100  2020-01-01 00:00:05     NaN
1  54        87  2020-01-01 00:00:06   0.870
2  58       400  2020-01-01 00:00:08     NaN
3  58       390  2020-01-01 00:00:14   0.975

Solution working with groups:

Another idea is use GroupBy.transform with GroupBy.first and divide by last value per ID by DataFrame.drop_duplicates:

first= df.groupby('ID')['Quantity'].transform('first')
df['Result'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
print (df)
   ID  Quantity                 TimeResult0541002020-01-0100:00:05     NaN
154872020-01-0100:00:060.8702584002020-01-0100:00:08     NaN
3583902020-01-0100:00:140.975

Or use lambda function with replace by NaN by Series.mask:

df['Result'] = (df.groupby('ID')['Quantity'].transform(lambda x: x.iat[-1] / x.iat[0])
                  .mask(df['ID'].duplicated(keep='last')))
print (df)

   ID  Quantity                 Time  Result
0  54       100  2020-01-01 00:00:05     NaN
1  54        87  2020-01-01 00:00:06   0.870
2  58       400  2020-01-01 00:00:08     NaN
3  58       390  2020-01-01 00:00:14   0.975

Whats happen if a least one group with unique ID?

print (df)
   ID  Quantity                 Time
0  54       100  2020-01-01 00:00:05
1  58       400  2020-01-01 00:00:08
2  58       390  2020-01-01 00:00:14

#faileddf['Result1'] = df['Quantity'].iloc[1::2].div(df['Quantity'].shift())
#working correct
first = df.groupby('ID')['Quantity'].transform('first')
df['Result2'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
#working correctdf['Result3'] = (df.groupby('ID')['Quantity'].transform(lambda x: x.iat[-1] / x.iat[0])
                  .mask(df['ID'].duplicated(keep='last')))
print (df)
   ID  Quantity                 Time  Result1  Result2  Result3
0  54       100  2020-01-01 00:00:05      NaN    1.000    1.000
1  58       400  2020-01-01 00:00:08      4.0      NaN      NaN
2  58       390  2020-01-01 00:00:14      NaN    0.975    0.975

Solution 2:

You could chunk into a couple of steps :

Create a grouping :

grouping = df.groupby("ID", sort=False).Quantity

Divide the last by the first per ID:

first_div_last = grouping.last().div(grouping.first()).array

Get a boolean for the last row for each ID:

last_rows = df.Quantity.eq(grouping.transform("last"))

Assign the result to a new column in df :

df.loc[last_rows,"Result"]=first_div_lastdfIDQuantityTimeResult0541002020-01-01 00:00:05 NaN154872020-01-01 00:00:06 0.8702584002020-01-01 00:00:08 NaN3583902020-01-01 00:00:14 0.975

Post a Comment for "Divide Last By First Row Of A Group"