Skip to content Skip to sidebar Skip to footer

How Can I Run A Set Method Over Lists In Terms Of Dictionary Keys / Values To Find Unique Items And List The Comparison Results?

I have a dictionary with values as lists of text values. (ID : [text values]) Below is an excerpt. data_dictionary = { 52384: ['text2015', 'webnet'], 18720: ['datascience'

Solution 1:

It's not the full solution to your problem, but part of it, as i believe it solves most of the problem:

In [1]: data_dictionary = {
   ...:     52384: ['text2015', 'webnet'],
   ...:     18720: ['datascience', 'bigdata', 'links'],
   ...:     82465: ['biological', 'biomedics', 'datamining', 'datamodel', 'semantics'],
   ...:     73120: ['links', 'scientometrics'],
   ...:     22276: ['text2015', 'webnet'],
   ...:     97376: ['text2015', 'webnet'],
   ...:     43424: ['biological', 'biomedics', 'datamining', 'datamodel', 'semantics'],
   ...:     23297: ['links', 'scientometrics'],
   ...:     45233: ['webnet', 'conference', 'links']
   ...: }

In [2]: from itertools import combinations
   ...:
   ...: intersections = []
   ...:
   ...: for first, second in combinations(data_dictionary.items(), r=2):
   ...:     intersection = set(first[1]) & set(second[1])
   ...:     if intersection:
   ...:         intersections.append((first[0], second[0], list(intersection)))
   ...:

In [3]: intersections
Out[3]:
[(52384, 22276, ['webnet', 'text2015']),
 (52384, 97376, ['webnet', 'text2015']),
 (52384, 45233, ['webnet']),
 (18720, 73120, ['links']),
 (18720, 23297, ['links']),
 (18720, 45233, ['links']),
 (82465,
  43424,
  ['semantics', 'datamodel', 'biological', 'biomedics', 'datamining']),
 (73120, 23297, ['links', 'scientometrics']),
 (73120, 45233, ['links']),
 (22276, 97376, ['webnet', 'text2015']),
 (22276, 45233, ['webnet']),
 (97376, 45233, ['webnet']),
 (23297, 45233, ['links'])]

What it does, it creates pairs of every element of your data_dictionary and then checks if intersections of values is not empty, then it puts that in intersections array in form of (key1, key2, intersection).

I hope that i gave you a quick-start from which you can finish your task.

Solution 2:

Using the answered example from vishes_shell above, I managed to get most of the desired output. In order to add individual sums, I considered having to rerun the extract sum method which seems non-optimal. So I left it out of the solution as I think up a different path.

forfirst, secondin combinations(data_dictionary.items(), r=2):
    intersection=set(first[1]) &set(second[1])
    if intersection:
        sum1 = extract_sum(first[0], sum_dict)
        sum2 = extract_sum(second[0], sum_dict)
        if sum1 < sum2:
            early =first[0]
            late =second[0]
        else:
            early =second[0]
            late =first[0]

        filename.write('%d , %d , %s'% (early, late, list(intersection)))
        filename.write('\n')

Post a Comment for "How Can I Run A Set Method Over Lists In Terms Of Dictionary Keys / Values To Find Unique Items And List The Comparison Results?"