Pandas Dataframe, Adding Duplicate Columns Together
I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values. This really larg
Solution 1:
I would propose to use groupby:
df = df.groupby(axis=1, level=0).sum()
In order to make it work also for MultiIndex, one can do:
if df.columns.duplicated().any():
all_levels = df.columns.nlevels
if all_levels > 1:
all_levels = range(all_levels)
df = df.groupby(axis=1, level=all_levels).sum()
EDIT
Instead of using groupby, one can now simply do:
df = df.sum(axis=1, level=0)
Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False
or min_count=1
(depending on use case):
df = df.sum(axis=1, level=0, skipna=False)
Solution 2:
I'm not sure why you would want to save the old column of values if you are summing them so here's a way to do it that way:
df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z}, index=a)
df.columns = ['Ruby', 'Python', 'Ruby']
df['Ruby'] = df['Ruby'].sum(axis=1)
df = df.T.drop_duplicates()
df = df.T
With a starting data frame that looks like:
RubyPythonRuby2010 1212011 2432012 3652013 4872014 5109
and then becomes:
RubyPython2010 222011 542012 862013 1182014 1410
Post a Comment for "Pandas Dataframe, Adding Duplicate Columns Together"