Grouping by first key in a dict and apply calculations on values inside other keys values, python di-CodePudding

I have the following test list:

testing = [
{'score': [('a', 90)],'text': 'abc'},
{'score': [('a', 80)], 'text': 'kuku'},
{'score': [('a', 70)], 'text': 'lulu'},
{'score': [('b', 90)], 'text': 'dalu'},
{'score': [('b', 86)], 'text': 'pupu'},
{'score': [('b', 80)], 'text': 'mumu'},
{'score': [('c', 46)], 'text': 'foo'},
{'score': [('c', 26)], 'text': 'too'}
]

I would like to go through each dict, group by the score's tuple first element (a, b or c) and average the second element collect the texts for each first element of score's tuple to get the following:

{"a": {"avg_score": 80, "texts_unique": ['abc', 'kuku', 'lulu']}, "b": the same logic... }

I have seen a pandas approach, any best practice to do this?

CodePudding user response：

Try:

from statistics import mean

testing = [
    {"score": [("a", 90)], "text": "abc"},
    {"score": [("a", 80)], "text": "kuku"},
    {"score": [("a", 70)], "text": "lulu"},
    {"score": [("b", 90)], "text": "dalu"},
    {"score": [("b", 86)], "text": "pupu"},
    {"score": [("b", 80)], "text": "mumu"},
    {"score": [("c", 46)], "text": "foo"},
    {"score": [("c", 26)], "text": "too"},
]

out = {}
for d in testing:
    out.setdefault(d["score"][0][0], []).append((d["score"][0][1], d["text"]))

out = {
    k: {
        "avg_score": mean(i for i, _ in v),
        "texts_unique": list(set(i for _, i in v)),
    }
    for k, v in out.items()
}
print(out)

Prints:

{
    "a": {"avg_score": 80, "texts_unique": ["abc", "kuku", "lulu"]},
    "b": {
        "avg_score": 85.33333333333333,
        "texts_unique": ["mumu", "dalu", "pupu"],
    },
    "c": {"avg_score": 36, "texts_unique": ["foo", "too"]},
}

CodePudding user response：

You can use itertools.groupby to group your data around the letter key and then use a helper function to return the desired object for each letter:

import itertools

def grouper(g):
    return { 'avg_score' : sum(t['score'][0][1] for t in g)/len(g), 'texts_unique' : list(set(t['text'] for t in g)) }

res = { k : grouper(list(g)) for k, g in itertools.groupby(testing, key=lambda t:t['score'][0][0]) }

Output:

{
    "a": {
        "avg_score": 80.0,
        "texts_unique": [
            "abc",
            "lulu",
            "kuku"
        ]
    },
    "b": {
        "avg_score": 85.33333333333333,
        "texts_unique": [
            "mumu",
            "dalu",
            "pupu"
        ]
    },
    "c": {
        "avg_score": 36.0,
        "texts_unique": [
            "foo",
            "too"
        ]
    }
}