Skip to content Skip to sidebar Skip to footer

Faster Way To Transform Group With Mean Value In Pandas

I have a Pandas dataframe where I am trying to replace the values in each group by the mean of the group. On my machine, the line df['signal'].groupby(g).transform(np.mean) takes a

Solution 1:

Current method, using transform

In [44]: grp = df["signal"].groupby(g)

In [45]: result2 = df["signal"].groupby(g).transform(np.mean)

In [47]: %timeit df["signal"].groupby(g).transform(np.mean)
1 loops, best of 3: 535 ms per loop

Using 'broadcasting' of the results

 In [43]: result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)

In [42]: %timeit pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
10 loops, best of 3: 119 ms per loop

In [46]: result.equals(result2)
Out[46]: True

I think you might need to set the index of the returned on the broadcast result (it happens to work here because its a default index

result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
result.index = df.index

Solution 2:

Inspired by Jeff's answer. This is the fastest method on my machine:

pd.Series(np.repeat(grp.mean().values, grp.count().values))

Post a Comment for "Faster Way To Transform Group With Mean Value In Pandas"