Lets say I have a dataframe named df that looks like the one below. I have a column of arrays and I want to count the frequency of all values in the array. I would like to group these counts based on the "Group" column if possible as well. The arrays don't have to be the same length.
Group | Value |
---|---|
a | [v1, v2] |
a | [v1, v3] |
b | [v2, v3] |
When I call df.groupby('Group')['Value'].value_counts()
I get the following output:
Group | Value | Freq |
---|---|---|
a | [v1, v2] | 1 |
a | [v1, v3] | 1 |
b | [v2, v3] | 1 |
However, I really want to treat each value in the array uniquely as part of the count. In other words, I am looking for an output like this.
Group | Value | Freq |
---|---|---|
a | [v1] | 2 |
a | [v2] | 1 |
a | [v3] | 1 |
b | [v2] | 1 |
b | [v3] | 1 |
How can I achieve this? Maybe some sort of flattening? I'm not sure if flattening would break my ability to group them though. Any help would be appreciated!
CodePudding user response:
Assuming a list in "Value":
(df.explode('Value')
.value_counts()
.reset_index(name='Freq')
)
Or:
(df.assign(Freq=1)
.explode('Value')
.groupby(['Group', 'Value'], as_index=False)
.sum()
)
Or:
(df.explode('Value')
.groupby(['Group', 'Value'], as_index=False)
.agg(Freq=('Value', 'size'))
)
NB. If you have strings, add .str[1:-1].str.split(r',\s*')
before the explode
.
Output:
Group Value Freq
0 a v1 2
1 a v2 1
2 a v3 1
3 b v2 1
4 b v3 1