Home > database >  DataFrame groupby on each item within a column of lists
DataFrame groupby on each item within a column of lists

Time:02-06

I have a dataframe (df):

| A   | B     | C                       |
| --- | ----- | ----------------------- |
| CA  | Jon   | [sales, engineering]    |
| NY  | Sarah | [engineering, IT]       |
| VA  | Vox   | [services, engineering] |

I am trying to group by each item in the C column list (sales, engineering, IT, etc.).

Tried:

df.groupby('C')

but got list not hashable, which is expected. I came across another post where it was recommended to convert the C column to tuple which is hashable, but I need to groupby each item and not the combination.

My goal is to get the count of each row in the df for each item in the C column list. So:

sales: 1
engineering: 3
IT: 1
services: 1

While there is probably a simpler way to obtain this than using groupby, I am still curious if groupby can be used in this case.

CodePudding user response:

You can explode & value_counts :

out = df.explode("C").value_counts("C")

​ Output :

print(out)

C          
engineering    3
IT             1
sales          1
services       1
dtype: int64
  • Related