Skip to content Skip to sidebar Skip to footer

Pandas.groupby.nsmallest Drops Multiindex When Dataframe Is Presorted

I am using pandas (0.22.0, python version 3.6.4) .groupby with the .nsmallest method to find the smallest items in each group of a dataframe. Here is an example dataframe: >>

Solution 1:

Interesting, "bug" I think you found here in pandas.SeriesGroupBy object with sorted dataframes.

I think instead we a can use pandas.DataFrameGroupBy object (however, I do believe you have a bug there).

import pandas as pd

df = pd.DataFrame({'a': ['foo', 'foo', 'foo', 'foo',
                             'bar', 'bar', 'bar', 'bar', 'bar',
                             'qux', 'qux', 'qux'],
                       'b': ['baz', 'baz', 'baz', 'bat',
                             'baz', 'baz', 'bat', 'bat', 'bat',
                             'baz', 'bat', 'bat'],
                       'c': [1, 3, 2, 5,
                             6, 4, 9, 12, 7,
                             10, 8, 11]})

df2 = df.sort_values('c', ascending=True)

df_sorted = df2.groupby(['a','b']).apply(lambda x: x.nsmallest(n=3, columns='c')).reset_index(drop=True)

df_unsorted = df.groupby(['a','b']).apply(lambda x: x.nsmallest(n=3, columns='c')).reset_index(drop=True)

all(df_sorted.eqw(df_unsorted)

Output:

True

Print df_sorted and df_unsorted:

print(df_sorted)

      ab   c
0   bar  bat   71   bar  bat   92   bar  bat  123   bar  baz   44   bar  baz   65   foo  bat   56   foo  baz   17   foo  baz   28   foo  baz   39   qux  bat   810  qux  bat  1111  qux  baz  10

print(df_unsorted)

ab   c
0   bar  bat   71   bar  bat   92   bar  bat  123   bar  baz   44   bar  baz   65   foo  bat   56   foo  baz   17   foo  baz   28   foo  baz   39   qux  bat   810  qux  bat  1111  qux  baz  10

Post a Comment for "Pandas.groupby.nsmallest Drops Multiindex When Dataframe Is Presorted"