Skip to content Skip to sidebar Skip to footer

Find Duplicates For Mixed Type Values In Dictionaries

I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow: from pickle import dumps

Solution 1:

The first thing is to remove the call to deepcopy which is your bottleneck here:

deffaithfulrepr(ds):
    ifisinstance(ds, collections.Mapping):
        res = collections.OrderedDict(
            (k, faithfulrepr(v)) for k, v insorted(ds.items())
        )
    elifisinstance(ds, list):
        res = [faithfulrepr(v) for v in ds]
    else:
        res = ds
    returnrepr(res)

However sorted and repr have their drawbacks:

  1. you can't trully compare custom types;
  2. you can't use mappings with different types of keys.

So the second thing is to get rid of faithfulrepr and compare objects with __eq__:

binder, values = [], []
for key, value in ds.items():
    try:
        index = values.index(value)
    except ValueError:
        values.append(value)
        binder.append([key])
    else:
        binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))

Post a Comment for "Find Duplicates For Mixed Type Values In Dictionaries"