Home > database >  How to aggregate distinct values of one key then sum the matching values of the other key?
How to aggregate distinct values of one key then sum the matching values of the other key?

Time:05-11

I've made a loop that gives me data in the following format:

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '1'}, 
 {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00002', 'quantity': '1'}]

I used the following loop to get the values above:

namesequence = EventSequence.objects.filter(description="names").values("Details")

name_quant = [{ 'name_id': e['element'][33:39], 
                        'quantity': e['element'][50:51] } for e in namesequence ]

So my question is how can I aggregate the name_ids and sum the quantities of matching name_ids so that i get a result like so:

 name_sum = [{'name_id': 'S00001', 'quantity': '160'}, {'name_id': 'S00002', 'quantity': '50'}, {'name_id': 'S00003', 'quantity': '40'}, {'name_id': 'S00004', 'quantity': '90'}]

I would have used the sum function in Django but I have to subscript and loop though the value first which makes it a bit more complicated :/

Any help is appreciated!

CodePudding user response:

If I understand the question correctly, it looks like the requirement is to consolidate keys (name_id) by quantity. I can't see how the required output values are derived from the sample input data but that may be because it's incomplete.

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '1'}, 
 {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00002', 'quantity': '1'}]

td = dict()

for e in name_quant:
    nid = e['name_id']
    td[nid] = td.get(nid, 0)   int(e['quantity'])

new_list = [{'name_id': k, 'quantity': str(v)} for k, v in td.items()]

print(new_list)

Output:

[{'name_id': 'S00004', 'quantity': '3'}, {'name_id': 'S00003', 'quantity': '5'}, {'name_id': 'S00002', 'quantity': '1'}]

CodePudding user response:

If the list of name_quant is large, I prefer to use pandas to do the groupby staff:

import pandas as pd

name_quant = [{'name_id': 'S00004', 'quantity': '1'}, {'name_id': 'S00004', 'quantity': '2'},
              {'name_id': 'S00003', 'quantity': '1'},
              {'name_id': 'S00003', 'quantity': '2'}, {'name_id': 'S00003', 'quantity': '2'},
              {'name_id': 'S00002', 'quantity': '1'}]

df = pd.DataFrame.from_records(name_quant)
df['quantity'] = df['quantity'].astype(int)
results = df.groupby(['name_id']).agg({'quantity': 'sum'}).to_records()  # [('S00002', 1) ('S00003', 5) ('S00004', 3)]
grouped_name_quant = [{'name_id': x[0], 'quantity': x[1]} for x in results]
print(grouped_name_quant)

The output is :

[{'name_id': 'S00002', 'quantity': 1}, {'name_id': 'S00003', 'quantity': 5}, {'name_id': 'S00004', 'quantity': 3}]
  • Related