Skip to content Skip to sidebar Skip to footer

How To Find The Top Column Values Of Each Row In A Pandas Dataframe

For a given dataframe with m columns (lets assume m=10), with in each row, I am trying to find top n column values (lets assume n=2). After finding these top n values for each row,

Solution 1:

First idea is compare top N values per rows by Series.nlargest and the nset values by DataFrame.where:

N = 2
df = df.where(df.apply(lambda x: x.eq(x.nlargest(N)), axis=1), 0)
print (df)
   col_A  col_B  col_C  col_D  col_E
00.000.000.00.40.510.000.100.10.00.020.240.240.00.00.030.000.250.30.00.0

For increase perfromance is used numpy, solution from @Divakar:

N = 2
#https://stackoverflow.com/a/61518029/2901002
idx = np.argsort(-df.to_numpy(), kind='mergesort')[:,:N]
mask = np.zeros(df.shape, dtype=bool)
np.put_along_axis(mask, idx, True, axis=-1)
df = df.where(mask, 0)
print (df)
   col_A  col_B  col_C  col_D  col_E
0   0.00   0.00    0.0    0.4    0.5
1   0.00   0.10    0.1    0.0    0.0
2   0.24   0.24    0.0    0.0    0.0
3   0.00   0.25    0.3    0.0    0.0

Post a Comment for "How To Find The Top Column Values Of Each Row In A Pandas Dataframe"