Skip to content Skip to sidebar Skip to footer

"group By" Function In Python For Array

I've tried Pandas and Numpy but haven't seen the result I want. I have a simple array that consists of several lines of this: [[customer_number, customer_name, invoice balance],[c

Solution 1:

You can make a dict that is keyed to the tuple of account name/number. Then loop through and collect the sums in the dict. Afterward you can convert the dict items() back a list:

accounts = {}

for num, name, balance in l:
    accounts[(num, name)] = accounts.get((num, name), 0) + balance
    
result = [[num, name, balance] for (num, name), balance in accounts.items()]

result will be:

[[Decimal('1111'), 'Customer1', Decimal('522.09')],
 [Decimal('1112'), 'Customer2', Decimal('177.15')],
 [Decimal('1113'), 'Customer3', Decimal('201.60')]]

Solution 2:

Just to show you that you can do this with pandas also:

In [1]: import pandas as pd

In [2]: from decimal import Decimal

In [3]: data = [[Decimal('1111'), 'Customer1', Decimal('31.50')],
   ...: [Decimal('1112'), 'Customer2', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('90.00')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1112'), 'Customer2', Decimal('30.88')],
   ...: [Decimal('1112'), 'Customer2', Decimal('15.00')],
   ...: [Decimal('1111'), 'Customer1', Decimal('37.93')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1113'), 'Customer3', Decimal('26.60')],
   ...: [Decimal('1113'), 'Customer3', Decimal('44.22')],
   ...: [Decimal('1112'), 'Customer2', Decimal('32.93')],
   ...: [Decimal('1111'), 'Customer1', Decimal('20.00')],
   ...: [Decimal('1113'), 'Customer3', Decimal('38.14')],
   ...: [Decimal('1111'), 'Customer1', Decimal('16.60')],
   ...: [Decimal('1112'), 'Customer2', Decimal('67.46')],
   ...: [Decimal('1111'), 'Customer1', Decimal('30.88')],
   ...: [Decimal('1113'), 'Customer3', Decimal('30.88')],
   ...: [Decimal('1111'), 'Customer1', Decimal('233.42')]]

In [4]: df = pd.DataFrame(data, columns=['customer_id', 'customer_name', 'invoice_balance'])

In [5]: df
Out[5]:
   customer_id customer_name invoice_balance
01111     Customer1           31.5011112     Customer2           30.8821111     Customer1           90.0031113     Customer3           30.8841112     Customer2           30.8851112     Customer2           15.0061111     Customer1           37.9371113     Customer3           30.8881111     Customer1           30.8891111     Customer1           30.88101113     Customer3           26.60111113     Customer3           44.22121112     Customer2           32.93131111     Customer1           20.00141113     Customer3           38.14151111     Customer1           16.60161112     Customer2           67.46171111     Customer1           30.88181113     Customer3           30.88191111     Customer1          233.42

Now, you can use a sql-esque declarative approach with pandas:

In[6]: df.groupby(['customer_id', 'customer_name'])['invoice_balance'].sum()
Out[6]:
customer_idcustomer_name1111Customer1522.091112Customer2177.151113Customer3201.60Name: invoice_balance, dtype: object

Of course, I probably wouldn't add pandas as a dependency to your project just for this. but it is possible.

Solution 3:

# always use decimal type for money, not floatfrom decimal import Decimal

# input data
data = [
    [ 1, 'Bob',   Decimal('1.23') ],
    [ 2, 'Alice', Decimal('2.34') ],
    [ 1, 'Bob',   Decimal('3.45') ],
    [ 2, 'Alice', Decimal('4.56') ],
]

# sum balances into buckets by customer number
buckets = {}
for num, name, balance in data:
    buckets.setdefault(num, [num, name, Decimal('0.00')])[2] += balance

# print the resultfor bucket in buckets.values():
    print(bucket)

Output:

[1, 'Bob', Decimal('4.68')][2, 'Alice', Decimal('6.90')]

Post a Comment for ""group By" Function In Python For Array"