Skip to content Skip to sidebar Skip to footer

Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

Working in Python / pandas / dataframes I have these two dataframes: Dataframe one: 1 2 3 1 Stockholm 100 250 2 Stockholm 150 376 3 St

Solution 1:

Use:

df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge) 

df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated rowwith a True
df_final = df1[~df1['Duplicated']] # selects onlyrows which arenot duplicated.
del df_final['Duplicated'] # delete the indicatorcolumn

The idea is as follows:

  1. do a inner join on all the columns
  2. append the output of the inner join to df1
  3. identify the duplicated rows in df1
  4. select the not duplicated rows in df1

Each number corresponds to each line of code.

Post a Comment for "Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe"