I have this list of dicts (just a sample the list is bigger):
my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '300'], [1671260400, '300']]}]
I want to sum all values
for each account,email_domain by version, and update my_list accordingly.
Desired output:
my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '200'], [1671260400, '200']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '600'], [1671260400, '600']]}]
Notes:
- In
'values': [[1671256800, '600'], [1671260400, '600']]
the first value of each array is timestamp (1671256800,1671260400). - I went through a lot of threads on this site before posting this question. For this use case I could not find a correct syntax for a list of 2n dictionaries. Your help is much appreciated!
I tried following group-and-aggregate-a-list-of-dictionaries-by-multiple-keys.
I started:
d = (pd.DataFrame(my_list)).groupby(['metric']['ebs_account'], ['metric']['version']).values.
CodePudding user response:
here you go
import pandas as pd
my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '300'], [1671260400, '300']]}]
my_new_list = {}
for i in my_list:
version = i['metric']['version']
if version not in my_new_list:
my_new_list[version] = i
else:
my_new_list[version]['values'] =i['values']
df = pd.DataFrame(my_new_list[version]['values'])
df[1] = df[1].apply(lambda x:int(x))
df = df.groupby(0)[1].sum().reset_index().values.tolist()
my_new_list[version]['values'] = df
output = list(my_new_list.values())
print(output)
CodePudding user response:
Try:
from collections import Counter
out = {}
for d in my_list:
a, e, vr = (
d["metric"]["account"],
d["metric"]["email_domain"],
d["metric"]["version"],
)
for t, v in d["values"]:
out.setdefault((a, e, vr), Counter())[t] = int(v)
out = [
{
"metric": {"account": a, "email_domain": e, "version": vr},
"values": [[kk, str(vv)] for kk, vv in v.items()],
}
for (a, e, vr), v in out.items()
]
print(out)
Prints:
[
{
"metric": {"account": "1", "email_domain": "gmail.com", "version": "a"},
"values": [[1671256800, "200"], [1671260400, "200"]],
},
{
"metric": {"account": "1", "email_domain": "gmail.com", "version": "b"},
"values": [[1671256800, "600"], [1671260400, "600"]],
},
]
CodePudding user response:
You can use defaultdict
with frozenset
from collections import defaultdict
group_dict = defaultdict(dict)
for record in my_list:
key = frozenset(record['metric'].items())
for x, y in record['values']:
group_dict[key][x] = group_dict[key].setdefault(x, 0) int(y)
res = [{'metrics': dict(k), 'values': [[k, str(vv)] for k, vv in v.items()]} for k, v in group_dict.items()]
print(res)
Output:
[{'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
'values': [[1671256800, '200'], [1671260400, '200']]},
{'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
'values': [[1671256800, '600'], [1671260400, '600']]}]