Home > Enterprise >  Pandas: How to get the count of each value in a column with groupby option
Pandas: How to get the count of each value in a column with groupby option

Time:03-09

This is the dataframe I've got:

data = {'Year' : [2021, 2021, 2021, 2022, 2022, 2022], 
        'Class':['A', 'A', 'B', 'A', 'C', 'C'], 
        'Animal':['dog|cat|bird', 'cat|dog', 'tiger|dog', 'cat|bird', 'dog|cat|rabbit', 'rabbit|dog|tiger',]}
df = pd.DataFrame(data)

So the df looks like:

Year Class Animal
2021 A dog|cat|bird
2021 A cat|dog
2021 B tiger|dog
2022 A cat|bird
2022 C dog|cat|rabbit
2022 C rabbit|dog|tiger

What I'd like to do is to calculate the number of each animal in each year and class. For example, I want to get the following dataframe:

Year Class Animal Count
2021 A dog 2
2021 A cat 2
2021 A bird 1
2021 B tiger 1
2021 B dog 1
2022 A cat 1
2022 A bird 1
2022 C dog 2
2022 C cat 1
2022 C rabbit 2
2022 C tiger 1

Does anyone have any suggestions about achieving this? I'd be really appreciate it.

CodePudding user response:

You can do this with a one-liner:

(df.assign(Animal=df['Animal'].str.split('|'))     # Create a list using split
  .explode('Animal')                               # Expand that list it rows using explode
  .value_counts(sort=False)                        # Use pd.DataFrame.value_counts 
  .rename('Count')                                 # Rename series
  .reset_index())                                  # Reset to a dataframe

Output:

    Year Class  Animal  Count
0   2021     A    bird      1
1   2021     A     cat      2
2   2021     A     dog      2
3   2021     B     dog      1
4   2021     B   tiger      1
5   2022     A    bird      1
6   2022     A     cat      1
7   2022     C     cat      1
8   2022     C     dog      2
9   2022     C  rabbit      2
10  2022     C   tiger      1

CodePudding user response:

Let us try str.get_dummies then groupby

out = (df.Animal.str.get_dummies('|')
       .groupby([df['Year'],df['Class']]).sum()
       .mask(lambda x : x==0)
       .rename_axis(['animal'],axis=1).stack().reset_index(name='Count')
Out[666]: 
    Year Class  animal  Count
0   2021     A    bird    1.0
1   2021     A     cat    2.0
2   2021     A     dog    2.0
3   2021     B     dog    1.0
4   2021     B   tiger    1.0
5   2022     A    bird    1.0
6   2022     A     cat    1.0
7   2022     C     cat    1.0
8   2022     C     dog    2.0
9   2022     C  rabbit    2.0
10  2022     C   tiger    1.0
  • Related