Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

January 20, 2024 Post a Comment

Working in Python / pandas / dataframes I have these two dataframes: Dataframe one: 1 2 3 1 Stockholm 100 250 2 Stockholm 150 376 3 St

Solution 1:

Use:

df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge) 

df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated rowwith a True
df_final = df1[~df1['Duplicated']] # selects onlyrows which arenot duplicated.
del df_final['Duplicated'] # delete the indicatorcolumn

The idea is as follows:

do a inner join on all the columns
append the output of the inner join to df1
identify the duplicated rows in df1
select the not duplicated rows in df1

Each number corresponds to each line of code.

lacucinadiadine

Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe

Solution 1:

Post a Comment for "Finding Duplicates In Two Dataframes And Removing The Duplicates From One Dataframe"

Widget HTML #3