Skip to content Skip to sidebar Skip to footer

Pandas: Conditional Filtering Based On The Last Row Of Each Dataframe Within Another Datarame

I have a dataframe df as follows Date Group Value Duration 2018-01-01 A 20 30 2018-02-01 A 10 60 2018-03-01 A 25 88

Solution 1:

You can test if maximal value per groups is hogher of equal like 90 in GroupBy.transform and then filter by boolean indexing:

df = df[df.groupby('Group')['Duration'].transform('max') >= 90]
#alternative#df = df[df.groupby('Group')['Duration'].transform('max').ge(90)]print (df)
         Date Group  Value  Duration
3  2018-01-01     B     15       180
4  2018-02-01     B     30       210
5  2018-03-01     B     25       238

EDIT: I suggest dont use filter for filtration per groups, because slow:

#1k rows
np.random.seed(123)
N = 1000
df = pd.DataFrame({'Group': np.random.randint(100, size=N),
                   'Duration':np.random.randint(200,size=N)})

#print (df)

In [24]: %timeit df.groupby('Group').filter(lambda x: x.Duration.max()>=90)
39.8 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [25]: %timeit df[df.groupby('Group')['Duration'].transform('max') >= 90]
3.12 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#100k rows
np.random.seed(123)
N = 100000
df = pd.DataFrame({'Group': np.random.randint(1000, size=N),
                   'Duration':np.random.randint(200,size=N)})

#print (df)


In [29]: %timeit df[df.groupby('Group')['Duration'].transform('max') >= 90]
11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [30]: %timeit df.groupby('Group').filter(lambda x: x.Duration.max()>=90)
394 ms ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 2:

You can also use filter.

df.groupby('Group').filter(lambda x: x.Duration.max()>=90)

    DateGroupValueDuration32018-01-01B1518042018-02-01B3021052018-03-01B25238

Post a Comment for "Pandas: Conditional Filtering Based On The Last Row Of Each Dataframe Within Another Datarame"