Skip to content Skip to sidebar Skip to footer

Group By And Aggregate The Values Of A List Of Dictionaries In Python

I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys. Example: my_dataset = [ { 'date

Solution 1:

You can use collections.Counter and collections.defaultdict.

Using a dict this can be done in O(N), while sorting requires O(NlogN) time.

from collections import defaultdict, Counter
defsolve(dataset, group_by_key, sum_value_keys):
    dic = defaultdict(Counter)
    for item in dataset:
        key = item[group_by_key]
        vals = {k:item[k] for k in sum_value_keys}
        dic[key].update(vals)
    return dic
... >>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class'collections.Counter'>,
{
 datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
 datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})

The advantage of Counter is that it'll automatically sum the values of similar keys.:

Example:

>>>c = Counter(**{'value1': 10, 'value2': 5})>>>c.update({'value1': 7, 'value2': 3})>>>c
Counter({'value1': 17, 'value2': 8})

Solution 2:

Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:

defgroup_and_sum_dataset(dataset, group_by_key, sum_value_keys):

    container = defaultdict(Counter)

    for item in dataset:
        key = item[group_by_key]
        values = {k:item[k] for k in sum_value_keys}
        container[key].update(values)

    new_dataset = [
        dict([(group_by_key, item[0])] + item[1].items())
            for item in container.items()
    ]
    new_dataset.sort(key=lambda item: item[group_by_key])

    return new_dataset

Solution 3:

Here's an approach using more_itertools where you simply focus on how to construct output.

Given

import datetime
import collections as ct

import more_itertools as mit


dataset = [
    {"date": datetime.date(2013, 1, 1), "id": 99, "value1": 10, "value2": 10},
    {"date": datetime.date(2013, 1, 1), "id": 98, "value1": 10, "value2": 10},
    {"date": datetime.date(2013, 1, 2), "id": 99, "value1": 10, "value2": 10}
]

Code

# Step 1: Build helper functions    
kfunc = lambda d: d["date"]
vfunc = lambda d: {k:v for k, v in d.items() if k.startswith("val")}
rfunc = lambda lst: sum((ct.Counter(d) for d in lst), ct.Counter())

# Step 2: Build a dict    
reduced = mit.map_reduce(dataset, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
reduced

Output

defaultdict(None,
            {datetime.date(2013, 1, 1): Counter({'value1': 20, 'value2': 20}),
             datetime.date(2013, 1, 2): Counter({'value1': 10, 'value2': 10})})

The items are grouped by date and pertinent values are reduced as Counters.


Details

Steps

  1. build helper functions to customize construction of keys, values and reduced values in the final defaultdict. Here we want to:
    • group by date (kfunc)
    • built dicts keeping the "value*" parameters (vfunc)
    • aggregate the dicts (rfunc) by converting to collections.Counters and summing them. See an equivalent rfunc below.
  2. pass in the helper functions to more_itertools.map_reduce.

Simple Groupby

... say in that example you wanted to group by id and date?

No problem.

>>> kfunc2 = lambda d: (d["date"], d["id"])
>>> mit.map_reduce(dataset, keyfunc=kfunc2, valuefunc=vfunc, reducefunc=rfunc)
defaultdict(None,
            {(datetime.date(2013, 1, 1),
              99): Counter({'value1': 10, 'value2': 10}),
             (datetime.date(2013, 1, 1),
              98): Counter({'value1': 10, 'value2': 10}),
             (datetime.date(2013, 1, 2),
              99): Counter({'value1': 10, 'value2': 10})})

Customized Output

While the resulting data structure clearly and concisely presents the outcome, the OP's expected output can be rebuilt as a simple list of dicts:

>>> [{**dict(date=k), **v} for k, v in reduced.items()]
[{'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20},
 {'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10}]

For more on map_reduce, see the docs. Install via > pip install more_itertools.

An equivalent reducing function:

defrfunc(lst: typing.List[dict]) -> ct.Counter:"""Return reduced mappings from map-reduce values."""
    c = ct.Counter()
    for d inlst:
        c += ct.Counter(d)
    return c

Post a Comment for "Group By And Aggregate The Values Of A List Of Dictionaries In Python"