Skip to content Skip to sidebar Skip to footer

Pandas Groupby Custom Groups

Let's say I have a dataframe like this: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']}) print(df) A B 0 1 a 1 2 a 2 3 b 3 4 b 4 5 c

Solution 1:

I think it really depends on the function you want to use. I can think of a trick with DataFrame.expanding for example if you want to calculate the sum.The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
      A
0   NaN
1   3.0
2   NaN
3  10.0
4   NaN
5  21.0

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]

      A
1   3.0
3  10.0
5  21.0

UPDATED

We can also use DataFrame.groupby + DataFrame.expanding

df.groupby('B').sum().expanding().sum()

To get the expected output:

new_df = (df.groupby('B').sum().expanding().sum()
            .reset_index()
            .assign(B = lambda x: x.B.add(' or ').cumsum()
                                  .str.rstrip(' or '))
            .set_index('B') )
print(new_df)
                A
B                
a             3.0
a or b       10.0
a or b or c  21.0

Post a Comment for "Pandas Groupby Custom Groups"