Python Pandas: Resampling Multivariate Time Series With A Groupby

April 20, 2024 Post a Comment

I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thi

Solution 1:

Edited for new solution. I think you can convert each of the transaction_dt to a Period object of 30 days and then do the grouping.

p=pd.period_range('2004-1-1','12-31-2018',freq='30D')deffind_period(v):p_idx=np.argmax(v<p.end_time)returnp[p_idx]df['period']=df['transaction_dt'].apply(find_period)dfcustomer_idtransaction_dtproductpriceunitsperiod012004-01-02  thing125472004-01-01112004-01-17  thing215082004-01-01222004-01-29  thing2150252004-01-01332017-07-15  thing355172017-06-21432016-05-12  thing355472016-04-27542012-02-23  thing2150222012-02-18642009-10-10  thing125122009-10-01742014-04-04  thing215022014-03-09852008-07-09  thing2150432008-07-08

We can now use this dataframe to get the concatenation of products, weighted average of price and sum of units. We then use some of the Period functionality to get the end time.

def my_funcs(df):
    data = {}
    data['product'] = '/'.join(df['product'].tolist())
    data['units'] = df.units.sum()
    data['price'] = np.average(df['price'], weights=df['units'])
    data['transaction_dt'] = df['transaction_dt'].iloc[0]
    data['window_start_time'] = df['period'].iloc[0].start_time
    data['window_end_time'] = df['period'].iloc[0].end_time
    return pd.Series(data, index=['transaction_dt', 'product', 'price','units', 
                                  'window_start_time', 'window_end_time'])

df.groupby(['customer_id', 'period']).apply(my_funcs).reset_index('period', drop=True)

lacucinadiadine

Python Pandas: Resampling Multivariate Time Series With A Groupby

Solution 1:

Post a Comment for "Python Pandas: Resampling Multivariate Time Series With A Groupby"

Widget HTML #3