Pandas.groupby.nsmallest Drops Multiindex When Dataframe Is Presorted
I am using pandas (0.22.0, python version 3.6.4) .groupby with the .nsmallest method to find the smallest items in each group of a dataframe. Here is an example dataframe: >>
Solution 1:
Interesting, "bug" I think you found here in pandas.SeriesGroupBy object with sorted dataframes.
I think instead we a can use pandas.DataFrameGroupBy object (however, I do believe you have a bug there).
import pandas as pd
df = pd.DataFrame({'a': ['foo', 'foo', 'foo', 'foo',
'bar', 'bar', 'bar', 'bar', 'bar',
'qux', 'qux', 'qux'],
'b': ['baz', 'baz', 'baz', 'bat',
'baz', 'baz', 'bat', 'bat', 'bat',
'baz', 'bat', 'bat'],
'c': [1, 3, 2, 5,
6, 4, 9, 12, 7,
10, 8, 11]})
df2 = df.sort_values('c', ascending=True)
df_sorted = df2.groupby(['a','b']).apply(lambda x: x.nsmallest(n=3, columns='c')).reset_index(drop=True)
df_unsorted = df.groupby(['a','b']).apply(lambda x: x.nsmallest(n=3, columns='c')).reset_index(drop=True)
all(df_sorted.eqw(df_unsorted)
Output:
True
Print df_sorted and df_unsorted:
print(df_sorted)
ab c
0 bar bat 71 bar bat 92 bar bat 123 bar baz 44 bar baz 65 foo bat 56 foo baz 17 foo baz 28 foo baz 39 qux bat 810 qux bat 1111 qux baz 10
print(df_unsorted)
ab c
0 bar bat 71 bar bat 92 bar bat 123 bar baz 44 bar baz 65 foo bat 56 foo baz 17 foo baz 28 foo baz 39 qux bat 810 qux bat 1111 qux baz 10
Post a Comment for "Pandas.groupby.nsmallest Drops Multiindex When Dataframe Is Presorted"