Sorting A Pandas Dataframe By The Order Of A List
So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond
Solution 1:
You could make the Class
column your index column
df = df.set_index('Class')
and then use df.loc
to reindex the DataFrame with class_list
:
df.loc[class_list]
Minimal example:
>>>df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})>>>df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>>df = df.set_index('Class')>>>df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3
Solution 2:
Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in time does not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:
ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']
df_list = []
for i in ordered_classes:
df_list.append(df[df['Class']==i])
ordered_df = pd.concat(df_list)
Post a Comment for "Sorting A Pandas Dataframe By The Order Of A List"