Skip to content Skip to sidebar Skip to footer

Dataframe Classification And Sorting Optimization Problem

I want to take the two letters in the dataframe column 'category' and the maximum 4 in the other column 'data1', and sort them according to certain rules. I used the method of slic

Solution 1:

Use:

#for same random data for compare
np.random.seed(2021)


df = pd.DataFrame()
n = 200
df['category'] = np.random.choice(('A', 'B'), n)
df['data1'] = np.random.randint(1, 10000, len(df))
df['data2'] = np.random.randint(1, 10000, len(df))
a = df[df['category'] == 'A'].sort_values(by='data1', ascending=False).head(4)
b = df[df['category'] == 'B'].sort_values(by='data1', ascending=False).head(4)
df1 = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True)
print(df1)
  category  data1  data2
0        A   9882   9868
1        A   9855   6701
2        A   9798   1058
3        A   9669   7334
4        B   9973   3668
5        B   9900   4340
6        B   9846   7885
7        B   9659   4933

Use DataFrame.sort_values by both columns first and then add GroupBy.head:

df1=(df.sort_values(by=['category','data1'],ascending=[True,False]).groupby('category').head(4).reset_index(drop=True))print(df1)categorydata1data20A9882   98681A9855   67012A9798   10583A9669   73344B9973   36685B9900   43406B9846   78857B9659   4933

Post a Comment for "Dataframe Classification And Sorting Optimization Problem"