Dataframe Classification And Sorting Optimization Problem
I want to take the two letters in the dataframe column 'category' and the maximum 4 in the other column 'data1', and sort them according to certain rules. I used the method of slic
Solution 1:
Use:
#for same random data for compare
np.random.seed(2021)
df = pd.DataFrame()
n = 200
df['category'] = np.random.choice(('A', 'B'), n)
df['data1'] = np.random.randint(1, 10000, len(df))
df['data2'] = np.random.randint(1, 10000, len(df))
a = df[df['category'] == 'A'].sort_values(by='data1', ascending=False).head(4)
b = df[df['category'] == 'B'].sort_values(by='data1', ascending=False).head(4)
df1 = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True)
print(df1)
category data1 data2
0 A 9882 9868
1 A 9855 6701
2 A 9798 1058
3 A 9669 7334
4 B 9973 3668
5 B 9900 4340
6 B 9846 7885
7 B 9659 4933
Use DataFrame.sort_values
by both columns first and then add GroupBy.head
:
df1=(df.sort_values(by=['category','data1'],ascending=[True,False]).groupby('category').head(4).reset_index(drop=True))print(df1)categorydata1data20A9882 98681A9855 67012A9798 10583A9669 73344B9973 36685B9900 43406B9846 78857B9659 4933
Post a Comment for "Dataframe Classification And Sorting Optimization Problem"