I have a dataframe cointaining the following values below.
dt mentions hod
0 2022-02-16 10:15:42 00:00 [$X] 10
1 2022-02-16 10:13:13 00:00 [$X] 10
2 2022-02-15 13:41:07 00:00 [$Y] 13
3 2022-02-15 13:27:25 00:00 [$Z, $Y] 13
4 2022-02-15 13:15:52 00:00 [$Z] 13
I am running a groupby for the hour of day hod_group = df.groupby(['hod'])
However, I am having trouble figuring out how do I get into this format after a groupby.
mentions hod
{"$X": 2} 10
{"$Y": 2, "$Z", 2} 13
CodePudding user response:
Flatten values by chain.from_iterable
and then count by collections.Counter
per groups in lambda function:
from collections import Counter
from itertools import chain
f = lambda x: Counter(chain.from_iterable(x))
hod_group = df.groupby('hod')['mentions'].agg(f).reset_index(name='mentioned')
print (hod_group)
hod mentioned
0 10 {'$X': 2}
1 13 {'$Y': 2, '$Z': 2}
CodePudding user response:
You could explode
the data, then value_counts
per group:
(df.explode('mentions')
.groupby('hod')['mentions']
.agg(lambda s: s.value_counts().to_dict())
.reset_index()
)
output:
hod mentions
0 10 {'$X': 2}
1 13 {'$Y': 2, '$Z': 2}