I have the following test list:
testing = [
{'score': [('a', 90)],'text': 'abc'},
{'score': [('a', 80)], 'text': 'kuku'},
{'score': [('a', 70)], 'text': 'lulu'},
{'score': [('b', 90)], 'text': 'dalu'},
{'score': [('b', 86)], 'text': 'pupu'},
{'score': [('b', 80)], 'text': 'mumu'},
{'score': [('c', 46)], 'text': 'foo'},
{'score': [('c', 26)], 'text': 'too'}
]
I would like to go through each dict, group by the score
's tuple first element (a, b or c) and average the second element collect the text
s for each first element of score's tuple to get the following:
{"a": {"avg_score": 80, "texts_unique": ['abc', 'kuku', 'lulu']}, "b": the same logic... }
I have seen a pandas approach, any best practice to do this?
CodePudding user response:
Try:
from statistics import mean
testing = [
{"score": [("a", 90)], "text": "abc"},
{"score": [("a", 80)], "text": "kuku"},
{"score": [("a", 70)], "text": "lulu"},
{"score": [("b", 90)], "text": "dalu"},
{"score": [("b", 86)], "text": "pupu"},
{"score": [("b", 80)], "text": "mumu"},
{"score": [("c", 46)], "text": "foo"},
{"score": [("c", 26)], "text": "too"},
]
out = {}
for d in testing:
out.setdefault(d["score"][0][0], []).append((d["score"][0][1], d["text"]))
out = {
k: {
"avg_score": mean(i for i, _ in v),
"texts_unique": list(set(i for _, i in v)),
}
for k, v in out.items()
}
print(out)
Prints:
{
"a": {"avg_score": 80, "texts_unique": ["abc", "kuku", "lulu"]},
"b": {
"avg_score": 85.33333333333333,
"texts_unique": ["mumu", "dalu", "pupu"],
},
"c": {"avg_score": 36, "texts_unique": ["foo", "too"]},
}
CodePudding user response:
You can use itertools.groupby
to group your data around the letter key and then use a helper function to return the desired object for each letter:
import itertools
def grouper(g):
return { 'avg_score' : sum(t['score'][0][1] for t in g)/len(g), 'texts_unique' : list(set(t['text'] for t in g)) }
res = { k : grouper(list(g)) for k, g in itertools.groupby(testing, key=lambda t:t['score'][0][0]) }
Output:
{
"a": {
"avg_score": 80.0,
"texts_unique": [
"abc",
"lulu",
"kuku"
]
},
"b": {
"avg_score": 85.33333333333333,
"texts_unique": [
"mumu",
"dalu",
"pupu"
]
},
"c": {
"avg_score": 36.0,
"texts_unique": [
"foo",
"too"
]
}
}