Skip to content Skip to sidebar Skip to footer

Comparing Two Csv Files Based On Specific Data In Two Columns

I was encouraged to step out of my comfort zone and use python with little to no experience and now I'm stuck. I'm trying to compare two CSV files (fileA.csv and fileB.csv), and a

Solution 1:

Store the information found in file A in memory first, in a set.

Then, reopen file A in append mode, and loop over file B. Any name from B not found in the set, can then be added to file A:

csv_dialect = dict(delimiter=',', quotechar='|')
names = set()
withopen('fileA', 'rb') as file_a:
    reader1 = csv.reader(file_a, **csv_dialect)
    next(reader1)
    for row in reader1:
        names.add((row[0], row[2]))

# `names` is now a set of all names (taken from columns 0 and 2) found in file A.withopen('fileA', 'ab') as file_a, open('fileB', 'rb') as file_b:
    writer = csv.writer(file_a, **csv_dialect)
    reader2 = csv.reader(file_b, **csv_dialect)
    next(reader2)
    for row in reader2:
        if (row[0], row[2]) notin names:
            # This row was not present in file A, add it.
            writer.writerow(row)

The combined with line requires Python 2.7 or newer. In earlier Python versions, simply nest the two statements:

withopen('fileA', 'ab') as file_a:
    withopen('fileB', 'rb') as file_b:
        # etc.

Solution 2:

You can try pandas, that might help you handle csv files easier, and seems its more readable:

import pandas as pd

df1 = pd.read_csv('FileA.csv', header=None)
df2 = pd.read_csv('FileB.csv', header=None)


for i in df2.index:
    # Don't append if that row is existed in FileAif i in df1.index:
        if df1.ix[i][0] == df2.ix[i][0] and df1.ix[i][2] == df2.ix[i][2]: continue

    df1 = df1.append(df2.ix[i])

df1.to_csv('FileA.csv', index=None, header=None)

Post a Comment for "Comparing Two Csv Files Based On Specific Data In Two Columns"