Match key value of multiple dictionaries inside list Python-CodePudding

I have a list with multiple dictionaries inside of it. Each of these dictionaries has an ID inside it. I need to find out which of the dictionaries have matching ID values. I can then create a new dictionary of the average of those two dictionaries.

Example:

[
{'id': 123, 'conversions': 1.4227642276422763, 'cpc': 2.2357723577235773, 'cpm': 4.471544715447155, 'reach': 90.65040650406505}, 
{'id': 123, 'conversions': 1.4056224899598393, 'cpc': 2.208835341365462, 'cpm': 5.622489959839357, 'reach': 89.5582329317269},
{'id': 1234, 'conversions': 1.4056224899598393, 'cpc': 2.208835341365462, 'cpm': 5.622489959839357, 'reach': 89.5582329317269},
]

So in this example, id = 123 is present in two dictionaries. So I want to create a new dictionary that has the average of those two dictionaries like so:

{'id': 123, 'conversions': 1.414166666, 'cpc': 2.2225462, 'cpm': 5.622489959839357, 'reach': 89.5582329317269}

Please note the averages in above example are not exactly correct.

My approach to this has been that I need to identify the dictionaries with similar IDs and then store them separately in a list. I can then use the following to create an average of them.

out = {k: mean(d[k] for d in lst) for k in lst[0]}
print(out)

My problem is I cannot identify and store them in a list.

Thank you

CodePudding user response：

Store the dictionaries in a separate list by grouping them by id:

from collections import defaultdict

data = [
    {'id': 123, 'conversions': 1.4227642276422763, 'cpc': 2.2357723577235773, 'cpm': 4.471544715447155, 'reach': 90.65040650406505}, 
    {'id': 123, 'conversions': 1.4056224899598393, 'cpc': 2.208835341365462, 'cpm': 5.622489959839357, 'reach': 89.5582329317269},
    {'id': 1234, 'conversions': 1.4056224899598393, 'cpc': 2.208835341365462, 'cpm': 5.622489959839357, 'reach': 89.5582329317269},
]

same_ids = defaultdict(list)
for item in data:
    same_ids[item["id"]].append(item)

print(same_ids)

Output

{
    "123": [
        {
            "id": 123,
            "conversions": 1.4227642276422763,
            "cpc": 2.2357723577235773,
            "cpm": 4.471544715447155,
            "reach": 90.65040650406505
        },
        {
            "id": 123,
            "conversions": 1.4056224899598393,
            "cpc": 2.208835341365462,
            "cpm": 5.622489959839357,
            "reach": 89.5582329317269
        }
    ],
    "1234": [
        {
            "id": 1234,
            "conversions": 1.4056224899598393,
            "cpc": 2.208835341365462,
            "cpm": 5.622489959839357,
            "reach": 89.5582329317269
        }
    ]
}

Now that they are grouped by ID, we can calculate the mean per group using your target calculation.

from statistics import mean

keys_to_average = set(data[0].keys())
# keys_to_average.discard("id")  # To remove id from the result

for key, lst in same_ids.items():
    out = {k: mean(d[k] for d in lst) for k in keys_to_average}
    print(key, out)

Output

123 {'cpc': 2.2223038495445193, 'conversions': 1.4141933588010578, 'id': 123, 'reach': 90.10431971789598, 'cpm': 5.047017337643256}
1234 {'cpc': 2.208835341365462, 'conversions': 1.4056224899598393, 'id': 1234, 'reach': 89.5582329317269, 'cpm': 5.622489959839357}