Skip to content Skip to sidebar Skip to footer

Best Strategy For Merging A Lot Of Data Frames Using Pandas

I'm trying to merge many (a few thousand one column tsv files) data frames into a single csv file using pandas. I'm new to pandas (and python for that matter) and could use some i

Solution 1:

Look at the docs for merge, when called from a frame, the first parameter is the 'other' frame, and the second is which variables you want to merge on (not actually sure what happens when you pass a DataFrame).

But, assuming your bird column is called 'bird', what you probably want is:

In [412]: df1.merge(df2, on='bird', how='outer').fillna(0)
Out[412]: 
        bird  value_x  value_y
0   bluebird       34781  chickadee      16802      eagle       1003       hawk       6734    sparrow        21785  albatross        0566    pelican        019

Solution 2:

I would think the fastest way is to set the column you want to merge on to the index, create a list of the dataframes and then pd.concat them. Something like this:

import os
import pandas as pd
directory = os.path.expanduser('~/home')
files = os.path.listdir(directory)
dfs = []
for filename in files:
    if'.tsv'in file:
        df = pd.read_table(os.path.join(directory,filename),sep='\t').set_index('bird')
        dfs.append(df)
master_df = pd.concat(dfs,axis=1)

Post a Comment for "Best Strategy For Merging A Lot Of Data Frames Using Pandas"