Compare Two Csv Files With Python Pandas
I have two csv files both consist of two columns. The first one has the product id, and the second has the serial number. I need to lookup, all serial numbers from the first csv, a
Solution 1:
I think you need merge:
A=pd.DataFrame({'productid':   [1455,5452,3775],'serial number':[44,55,66]})print(A)B=pd.DataFrame({'productid':   [7000,2000,1000],'serial number':[44,55,77]})print(B)print(pd.merge(A,B,on='serialnumber'))productid_xserialnumberproductid_y01455             44700015452             552000Solution 2:
Try this:
A = pd.read_csv("c1.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
B = pd.read_csv("c2.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
# A - B
pd.merge(A, B, on='col', how='left', indicator=True).query("_merge == 'left_only'")
# B - A
pd.merge(A, B, on='col', how='right', indicator=True).query("_merge == 'right_only'")
Solution 3:
You can convert df into Sets , that will ignore the index while comparing the data, then use set symmetric_difference
ds1 = set([ tuple(values) forvalues in df1.values.tolist()])
ds2 = set([ tuple(values) forvalues in df2.values.tolist()])
ds1.symmetric_difference(ds2)
print df1 ,'\n\n'print df2,'\n\n'print pd.DataFrame(list(ds1.difference(ds2))),'\n\n'print pd.DataFrame(list(ds2.difference(ds1))),'\n\n'df1
id  Name  score isEnrolled               Comment
0111  Jack   2.17True  He was late to class1112  Nick   1.11False             Graduated
2113   Zoe   4.12True                   NaN 
df2
    id  Name  score isEnrolled               Comment
0111  Jack   2.17True  He was late toclass1112  Nick   1.21False             Graduated
2113   Zoe   4.12FalseOn vacation 
Output
012340113   Zoe  4.12True        NaN
1112  Nick  1.11False  Graduated 
     012340113   Zoe  4.12FalseOn vacation
1112  Nick  1.21False    Graduated 
Post a Comment for "Compare Two Csv Files With Python Pandas"