Skip to content Skip to sidebar Skip to footer

Return N Smallest Indexes By Column Using Pandas

I have the following (simplified) dataframe: df = pd.DataFrame({'X': [1, 2, 3, 4, 5,6,7,8,9,10], 'Y': [10,20,30,40,50,-10,-20,-30,-40,-50], 'Z': [20,18,16,14,12,10,8,6,4,2]},index=

Solution 1:

You can use apply with nsmallest:

n = 3
df.apply(lambda x: pd.Series(x.nsmallest(n).index))

#   X   Y   Z#0  A   J   J#1  B   I   I#2  C   H   H

Solution 2:

Faster numpy solution with numpy.argsort:

N = 3
a = np.argsort(-df.values, axis=0)[-1:-1-N:-1]
print (a)
[[0 9 9]
 [1 8 8]
 [2 7 7]]

b = pd.DataFrame(df.index[a], columns=df.columns)
print (b)
   X  Y  Z
0  A  J  J
1  B  I  I
2  C  H  H

Timings:

In [111]: %timeit (pd.DataFrame(df.index[np.argsort(-df.values, axis=0)[-1:-1-N:-1]], columns=df.columns))
159 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [112]: %timeit (df.apply(lambda x: pd.Series(x.nsmallest(N).index)))
3.52 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Solution 3:

First, you want to sort your input dataframe per column, then get a list of all of the indices of each column, create a dataframe from these indices, then return the top n rows from the resultant dataframe.

deftopN(df, n):
#first, sort dataframe per column
sort_x = df.sort_values(by = ['X'], ascending = True)
sort_y = df.sort_values(by = ['Y'], ascending = True)
sort_z = df.sort_values(by = ['Z'], ascending = True)
#now get a list of the indices of each sorted df
index_list_x = sort_x.index.values.tolist()
index_list_y = sort_y.index.values.tolist()
index_list_z = sort_z.index.values.tolist()
#create dataframe from lists
sorted_df = pd.DataFrame(
    {'sorted_x':index_list_x,
     'sorted_y':index_list_y,
     'sorted_z':index_list_z  
    })
#return the top n from the sorted dataframereturn sorted_df.iloc[0:n]

topN(df,3)

Returns:

  X  Y  Z
0A  J  J
1BII2 C  H  H

Post a Comment for "Return N Smallest Indexes By Column Using Pandas"