Pandas Groupby Custom Groups
Let's say I have a dataframe like this: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']}) print(df) A B 0 1 a 1 2 a 2 3 b 3 4 b 4 5 c
Solution 1:
I think it really depends on the function you want to use.
I can think of a trick with DataFrame.expanding
for example if you want to calculate the sum.The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
A
0 NaN
1 3.0
2 NaN
3 10.0
4 NaN
5 21.0
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]
A
1 3.0
3 10.0
5 21.0
UPDATED
We can also use DataFrame.groupby
+ DataFrame.expanding
df.groupby('B').sum().expanding().sum()
To get the expected output:
new_df = (df.groupby('B').sum().expanding().sum()
.reset_index()
.assign(B = lambda x: x.B.add(' or ').cumsum()
.str.rstrip(' or '))
.set_index('B') )
print(new_df)
A
B
a 3.0
a or b 10.0
a or b or c 21.0
Post a Comment for "Pandas Groupby Custom Groups"