Skip to content Skip to sidebar Skip to footer

Sorting A Pandas Dataframe By The Order Of A List

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond

Solution 1:

You could make the Class column your index column

df = df.set_index('Class')

and then use df.loc to reindex the DataFrame with class_list:

df.loc[class_list]

Minimal example:

>>>df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})>>>df
                 Class  Number
0  Gammaproteobacteria       3
1        Bacteroidetes       5
2        Negativicutes       6

>>>df = df.set_index('Class')>>>df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
                     Number
Bacteroidetes             5
Negativicutes             6
Gammaproteobacteria       3

Solution 2:

Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in time does not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:

ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']

df_list = []

for i in ordered_classes:
   df_list.append(df[df['Class']==i])

ordered_df = pd.concat(df_list)

Post a Comment for "Sorting A Pandas Dataframe By The Order Of A List"