Home > Back-end >  Count and group individual values in a Dataframe of arrays
Count and group individual values in a Dataframe of arrays

Time:09-21

Lets say I have a dataframe named df that looks like the one below. I have a column of arrays and I want to count the frequency of all values in the array. I would like to group these counts based on the "Group" column if possible as well. The arrays don't have to be the same length.

Group Value
a [v1, v2]
a [v1, v3]
b [v2, v3]

When I call df.groupby('Group')['Value'].value_counts() I get the following output:

Group Value Freq
a [v1, v2] 1
a [v1, v3] 1
b [v2, v3] 1

However, I really want to treat each value in the array uniquely as part of the count. In other words, I am looking for an output like this.

Group Value Freq
a [v1] 2
a [v2] 1
a [v3] 1
b [v2] 1
b [v3] 1

How can I achieve this? Maybe some sort of flattening? I'm not sure if flattening would break my ability to group them though. Any help would be appreciated!

CodePudding user response:

Assuming a list in "Value":

(df.explode('Value')
   .value_counts()
   .reset_index(name='Freq')
)

Or:

(df.assign(Freq=1)
   .explode('Value')
   .groupby(['Group', 'Value'], as_index=False)
   .sum()
)

Or:

(df.explode('Value')
   .groupby(['Group', 'Value'], as_index=False)
   .agg(Freq=('Value', 'size'))
)

NB. If you have strings, add .str[1:-1].str.split(r',\s*') before the explode.

Output:

  Group Value  Freq
0     a    v1     2
1     a    v2     1
2     a    v3     1
3     b    v2     1
4     b    v3     1
  • Related