Selecting All Rows Before A Certain Entry In A Pandas Dataframe
How to select the rows that before a certain value in the columns first appear? I have a dataset of user activity and their timestamp recorded as follow:   df = pd.DataFrame([{'use
Solution 1:
You can avoid explicit apply with
In [2862]: df[df['activity'].eq('Purchase').groupby(df['user_id']).cumsum().eq(0)]
Out[2862]:
  activity        date  user_id
0     Open  2017-09-0111     Open  2017-09-0212     Open  2017-09-0313    Click  2017-09-0417     Open  2017-09-042Solution 2:
Use groupby and find all rows which are above the row where a user purchased some item. Then, use the mask to index.
dfactivitydateuser_id0Open2017-09-01        11Open2017-09-02        12Open2017-09-03        13Click2017-09-04        14Purchase2017-09-05        15Open2017-09-06        16Open2017-09-07        17Open2017-09-04        28Purchase2017-09-06        2m=df.groupby('user_id').activity\.apply(lambdax:(x=='Purchase').cumsum())==0df[m]activitydateuser_id0Open2017-09-01        11Open2017-09-02        12Open2017-09-03        13Click2017-09-04        17Open2017-09-04        2If your actual data isn't sorted like it is here, you could use df.sort_values and make sure it is:
df = df.sort_values(['user_id', 'date'])
Solution 3:
Use groupby by mask with DataFrameGroupBy.cumsum, convert to bool, invert condition and filter by boolean indexing:
#if necessary#df = df.sort_values(['user_id', 'date'])df = df[~df['activity'].eq('Purchase').groupby(df['user_id']).cumsum().astype(bool)]
print (df)
   user_id        date activity
0        1  2017-09-01     Open
1        1  2017-09-02     Open
2        1  2017-09-03     Open
3        1  2017-09-04    Click
7        2  2017-09-04     Open
Detail:
print (~df['activity'].eq('Purchase').groupby(df['user_id']).cumsum().astype(bool))
0True1True2True3True4False5False6False7True8False
Name: activity, dtype: bool
Post a Comment for "Selecting All Rows Before A Certain Entry In A Pandas Dataframe"