Home > Enterprise >  Group and aggregate a list of 2n dictionaries by multiple keys
Group and aggregate a list of 2n dictionaries by multiple keys

Time:12-17

I have this list of dicts (just a sample the list is bigger):

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]}]

I want to sum all values for each account,email_domain by version, and update my_list accordingly. Desired output:

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '200'], [1671260400, '200']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '600'], [1671260400, '600']]}]

Notes:

  • In 'values': [[1671256800, '600'], [1671260400, '600']] the first value of each array is timestamp (1671256800,1671260400).
  • I went through a lot of threads on this site before posting this question. For this use case I could not find a correct syntax for a list of 2n dictionaries. Your help is much appreciated!

I tried following group-and-aggregate-a-list-of-dictionaries-by-multiple-keys.

I started:

d = (pd.DataFrame(my_list)).groupby(['metric']['ebs_account'], ['metric']['version']).values.

CodePudding user response:

here you go

import pandas as pd

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]}]


my_new_list = {}
for i in my_list:
    version = i['metric']['version']
    if version not in my_new_list:
        my_new_list[version] = i
    else:
        my_new_list[version]['values'] =i['values']
        df = pd.DataFrame(my_new_list[version]['values'])
        df[1] = df[1].apply(lambda x:int(x))
        df = df.groupby(0)[1].sum().reset_index().values.tolist()
        my_new_list[version]['values'] = df

output = list(my_new_list.values())
print(output)

CodePudding user response:

Try:

from collections import Counter

out = {}
for d in my_list:
    a, e, vr = (
        d["metric"]["account"],
        d["metric"]["email_domain"],
        d["metric"]["version"],
    )

    for t, v in d["values"]:
        out.setdefault((a, e, vr), Counter())[t]  = int(v)

out = [
    {
        "metric": {"account": a, "email_domain": e, "version": vr},
        "values": [[kk, str(vv)] for kk, vv in v.items()],
    }
    for (a, e, vr), v in out.items()
]

print(out)

Prints:

[
    {
        "metric": {"account": "1", "email_domain": "gmail.com", "version": "a"},
        "values": [[1671256800, "200"], [1671260400, "200"]],
    },
    {
        "metric": {"account": "1", "email_domain": "gmail.com", "version": "b"},
        "values": [[1671256800, "600"], [1671260400, "600"]],
    },
]

CodePudding user response:

You can use defaultdict with frozenset

from collections import defaultdict


group_dict = defaultdict(dict)

for record in my_list:
    key = frozenset(record['metric'].items())
    for x, y in record['values']:
        group_dict[key][x] = group_dict[key].setdefault(x, 0)   int(y)
        
res = [{'metrics': dict(k), 'values': [[k, str(vv)] for k, vv in v.items()]} for k, v in group_dict.items()]

print(res)

Output:

[{'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
  'values': [[1671256800, '200'], [1671260400, '200']]},
 {'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
  'values': [[1671256800, '600'], [1671260400, '600']]}]
  • Related