Pandas Groupby Custom Groups

May 08, 2024 Post a Comment

Let's say I have a dataframe like this: df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']}) print(df) A B 0 1 a 1 2 a 2 3 b 3 4 b 4 5 c

Solution 1:

I think it really depends on the function you want to use. I can think of a trick with DataFrame.expanding for example if you want to calculate the sum.The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
      A
0   NaN
1   3.0
2   NaN
3  10.0
4   NaN
5  21.0

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]

      A
1   3.0
3  10.0
5  21.0

UPDATED

We can also use DataFrame.groupby + DataFrame.expanding

df.groupby('B').sum().expanding().sum()

To get the expected output:

new_df = (df.groupby('B').sum().expanding().sum()
            .reset_index()
            .assign(B = lambda x: x.B.add(' or ').cumsum()
                                  .str.rstrip(' or '))
            .set_index('B') )
print(new_df)
                A
B                
a             3.0
a or b       10.0
a or b or c  21.0

lacucinadiadine

Pandas Groupby Custom Groups

Solution 1:

Post a Comment for "Pandas Groupby Custom Groups"

Widget HTML #3