Compare Two Csv Files With Python Pandas
I have two csv files both consist of two columns. The first one has the product id, and the second has the serial number. I need to lookup, all serial numbers from the first csv, a
Solution 1:
I think you need merge
:
A=pd.DataFrame({'productid': [1455,5452,3775],'serial number':[44,55,66]})print(A)B=pd.DataFrame({'productid': [7000,2000,1000],'serial number':[44,55,77]})print(B)print(pd.merge(A,B,on='serialnumber'))productid_xserialnumberproductid_y01455 44700015452 552000
Solution 2:
Try this:
A = pd.read_csv("c1.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
B = pd.read_csv("c2.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
# A - B
pd.merge(A, B, on='col', how='left', indicator=True).query("_merge == 'left_only'")
# B - A
pd.merge(A, B, on='col', how='right', indicator=True).query("_merge == 'right_only'")
Solution 3:
You can convert df into Sets , that will ignore the index while comparing the data, then use set symmetric_difference
ds1 = set([ tuple(values) forvalues in df1.values.tolist()])
ds2 = set([ tuple(values) forvalues in df2.values.tolist()])
ds1.symmetric_difference(ds2)
print df1 ,'\n\n'print df2,'\n\n'print pd.DataFrame(list(ds1.difference(ds2))),'\n\n'print pd.DataFrame(list(ds2.difference(ds1))),'\n\n'
df1
id Name score isEnrolled Comment
0111 Jack 2.17True He was late to class1112 Nick 1.11False Graduated
2113 Zoe 4.12True NaN
df2
id Name score isEnrolled Comment
0111 Jack 2.17True He was late toclass1112 Nick 1.21False Graduated
2113 Zoe 4.12FalseOn vacation
Output
012340113 Zoe 4.12True NaN
1112 Nick 1.11False Graduated
012340113 Zoe 4.12FalseOn vacation
1112 Nick 1.21False Graduated
Post a Comment for "Compare Two Csv Files With Python Pandas"