Best Strategy For Merging A Lot Of Data Frames Using Pandas
I'm trying to merge many (a few thousand one column tsv files) data frames into a single csv file using pandas. I'm new to pandas (and python for that matter) and could use some i
Solution 1:
Look at the docs for merge, when called from a frame, the first parameter is the 'other' frame, and the second is which variables you want to merge on (not actually sure what happens when you pass a DataFrame).
But, assuming your bird column is called 'bird', what you probably want is:
In [412]: df1.merge(df2, on='bird', how='outer').fillna(0)
Out[412]:
bird value_x value_y
0 bluebird 34781 chickadee 16802 eagle 1003 hawk 6734 sparrow 21785 albatross 0566 pelican 019
Solution 2:
I would think the fastest way is to set the column you want to merge on to the index, create a list of the dataframes and then pd.concat
them. Something like this:
import os
import pandas as pd
directory = os.path.expanduser('~/home')
files = os.path.listdir(directory)
dfs = []
for filename in files:
if'.tsv'in file:
df = pd.read_table(os.path.join(directory,filename),sep='\t').set_index('bird')
dfs.append(df)
master_df = pd.concat(dfs,axis=1)
Post a Comment for "Best Strategy For Merging A Lot Of Data Frames Using Pandas"