Skip to content Skip to sidebar Skip to footer

Pandas Dataframe, Adding Duplicate Columns Together

I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values. This really larg

Solution 1:

I would propose to use groupby:

df = df.groupby(axis=1, level=0).sum()

In order to make it work also for MultiIndex, one can do:

if df.columns.duplicated().any():
    all_levels = df.columns.nlevels
    if all_levels > 1:
        all_levels = range(all_levels)
    df = df.groupby(axis=1, level=all_levels).sum()

EDIT

Instead of using groupby, one can now simply do:

df = df.sum(axis=1, level=0)

Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False or min_count=1 (depending on use case):

df = df.sum(axis=1, level=0, skipna=False)

Solution 2:

I'm not sure why you would want to save the old column of values if you are summing them so here's a way to do it that way:

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z}, index=a)
df.columns = ['Ruby', 'Python', 'Ruby']
df['Ruby'] = df['Ruby'].sum(axis=1)
df = df.T.drop_duplicates()
df = df.T

With a starting data frame that looks like:

RubyPythonRuby2010     1212011     2432012     3652013     4872014     5109

and then becomes:

RubyPython2010     222011     542012     862013    1182014    1410

Post a Comment for "Pandas Dataframe, Adding Duplicate Columns Together"