Aggregate By Repeated Datetime Index With Different Identifiers In A Column On A Pandas Dataframe
Solution 1:
In order for the groupby to return a df instead of a Series then use double subsription [[]]
:
by_date = df.groupby(df.index.date)[['value']].mean()
this then allows you to groupby by month and generate a boxplot:
by_month = by_date.groupby(by_date.index.month)
by_month.boxplot(subplots=False)
The use of double subsription is a subtle feature which is not immediately obvious, generally doing df[col]
will return a column, but we know that passing a list of columns col_list
will return a df: df[col_list]
which when expanded is the same as df[[col_a, col_b]]
this then leads to the conclusion that we can return a df if we did the following: df[[col_a]]
as we've passed a list with a single element, this is not the same as df[col_a]
where we've passed a label to perform column indexing.
Solution 2:
When you did the groupby on date, you converted the index from a Timestamp to a datetime.date.
>>>type(df.index[0])
pandas.tslib.Timestamp
>>>type(by_date.index[0])
datetime.date
If you convert the index to Periods, you can groupby easily.
df.index=pd.DatetimeIndex(by_date.index).to_period('M')>>>df.groupby(df.index).value.sum()2007-01-01 2.3139152007-02-01 0.7698832008-01-01 2.0127602008-02-01 0.294140Name:value,dtype:float64
Post a Comment for "Aggregate By Repeated Datetime Index With Different Identifiers In A Column On A Pandas Dataframe"