Skip to content Skip to sidebar Skip to footer

Compare Two Csv Files With Python Pandas

I have two csv files both consist of two columns. The first one has the product id, and the second has the serial number. I need to lookup, all serial numbers from the first csv, a

Solution 1:

I think you need merge:

A=pd.DataFrame({'productid':   [1455,5452,3775],'serial number':[44,55,66]})print(A)B=pd.DataFrame({'productid':   [7000,2000,1000],'serial number':[44,55,77]})print(B)print(pd.merge(A,B,on='serialnumber'))productid_xserialnumberproductid_y01455             44700015452             552000

Solution 2:

Try this:

A = pd.read_csv("c1.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
B = pd.read_csv("c2.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
# A - B
pd.merge(A, B, on='col', how='left', indicator=True).query("_merge == 'left_only'")
# B - A
pd.merge(A, B, on='col', how='right', indicator=True).query("_merge == 'right_only'")

Solution 3:

You can convert df into Sets , that will ignore the index while comparing the data, then use set symmetric_difference

ds1 = set([ tuple(values) forvalues in df1.values.tolist()])
ds2 = set([ tuple(values) forvalues in df2.values.tolist()])

ds1.symmetric_difference(ds2)
print df1 ,'\n\n'print df2,'\n\n'print pd.DataFrame(list(ds1.difference(ds2))),'\n\n'print pd.DataFrame(list(ds2.difference(ds1))),'\n\n'

df1

id  Name  score isEnrolled               Comment
0111  Jack   2.17True  He was late to class1112  Nick   1.11False             Graduated
2113   Zoe   4.12True                   NaN 

df2

    id  Name  score isEnrolled               Comment
0111  Jack   2.17True  He was late toclass1112  Nick   1.21False             Graduated
2113   Zoe   4.12FalseOn vacation 

Output

012340113   Zoe  4.12True        NaN
1112  Nick  1.11False  Graduated 


     012340113   Zoe  4.12FalseOn vacation
1112  Nick  1.21False    Graduated 

Post a Comment for "Compare Two Csv Files With Python Pandas"