Skip to content Skip to sidebar Skip to footer

Replace Duplicate Values Across Columns In Pandas

I have a simple dataframe as such: df = [ {'col1' : 'A', 'col2': 'B', 'col3': 'C', 'col4':'0'}, {'col1' : 'M', 'col2': '0', 'col3': 'M', 'col4':'0'}, {

Solution 1:

You can use the duplicated method to return a boolean indexer of whether elements are duplicates or not:

In [214]: pd.Series(['M', '0', 'M', '0']).duplicated()
Out[214]:
0False1False2True3True
dtype: bool

Then you could create a mask by mapping this across the rows of your dataframe, and using where to perform your substitution:

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, 0)

  col1 col2 col3 col4
0    A    B    C    01    M    0002    B    0003    X    0    Y    0

Post a Comment for "Replace Duplicate Values Across Columns In Pandas"