How do you groupby and count array frequency in pandas?-CodePudding

I have a dataframe cointaining the following values below.

                         dt    mentions    hod
0 2022-02-16 10:15:42 00:00        [$X]     10
1 2022-02-16 10:13:13 00:00        [$X]     10
2 2022-02-15 13:41:07 00:00        [$Y]     13
3 2022-02-15 13:27:25 00:00    [$Z, $Y]     13
4 2022-02-15 13:15:52 00:00        [$Z]     13

I am running a groupby for the hour of day hod_group = df.groupby(['hod'])

However, I am having trouble figuring out how do I get into this format after a groupby.

         mentions     hod
{"$X": 2}             10
{"$Y": 2, "$Z", 2}    13

CodePudding user response：

Flatten values by chain.from_iterable and then count by collections.Counter per groups in lambda function:

from collections import Counter
from  itertools import chain

f = lambda x: Counter(chain.from_iterable(x))
hod_group = df.groupby('hod')['mentions'].agg(f).reset_index(name='mentioned')
print (hod_group)
   hod           mentioned
0   10           {'$X': 2}
1   13  {'$Y': 2, '$Z': 2}

CodePudding user response：

You could explode the data, then value_counts per group:

(df.explode('mentions')
   .groupby('hod')['mentions']
   .agg(lambda s: s.value_counts().to_dict())
   .reset_index()
)

output:

   hod            mentions
0   10           {'$X': 2}
1   13  {'$Y': 2, '$Z': 2}