This is the dataframe I've got:
data = {'Year' : [2021, 2021, 2021, 2022, 2022, 2022],
'Class':['A', 'A', 'B', 'A', 'C', 'C'],
'Animal':['dog|cat|bird', 'cat|dog', 'tiger|dog', 'cat|bird', 'dog|cat|rabbit', 'rabbit|dog|tiger',]}
df = pd.DataFrame(data)
So the df looks like:
Year | Class | Animal |
---|---|---|
2021 | A | dog|cat|bird |
2021 | A | cat|dog |
2021 | B | tiger|dog |
2022 | A | cat|bird |
2022 | C | dog|cat|rabbit |
2022 | C | rabbit|dog|tiger |
What I'd like to do is to calculate the number of each animal in each year and class. For example, I want to get the following dataframe:
Year | Class | Animal | Count |
---|---|---|---|
2021 | A | dog | 2 |
2021 | A | cat | 2 |
2021 | A | bird | 1 |
2021 | B | tiger | 1 |
2021 | B | dog | 1 |
2022 | A | cat | 1 |
2022 | A | bird | 1 |
2022 | C | dog | 2 |
2022 | C | cat | 1 |
2022 | C | rabbit | 2 |
2022 | C | tiger | 1 |
Does anyone have any suggestions about achieving this? I'd be really appreciate it.
CodePudding user response:
You can do this with a one-liner:
(df.assign(Animal=df['Animal'].str.split('|')) # Create a list using split
.explode('Animal') # Expand that list it rows using explode
.value_counts(sort=False) # Use pd.DataFrame.value_counts
.rename('Count') # Rename series
.reset_index()) # Reset to a dataframe
Output:
Year Class Animal Count
0 2021 A bird 1
1 2021 A cat 2
2 2021 A dog 2
3 2021 B dog 1
4 2021 B tiger 1
5 2022 A bird 1
6 2022 A cat 1
7 2022 C cat 1
8 2022 C dog 2
9 2022 C rabbit 2
10 2022 C tiger 1
CodePudding user response:
Let us try str.get_dummies
then groupby
out = (df.Animal.str.get_dummies('|')
.groupby([df['Year'],df['Class']]).sum()
.mask(lambda x : x==0)
.rename_axis(['animal'],axis=1).stack().reset_index(name='Count')
Out[666]:
Year Class animal Count
0 2021 A bird 1.0
1 2021 A cat 2.0
2 2021 A dog 2.0
3 2021 B dog 1.0
4 2021 B tiger 1.0
5 2022 A bird 1.0
6 2022 A cat 1.0
7 2022 C cat 1.0
8 2022 C dog 2.0
9 2022 C rabbit 2.0
10 2022 C tiger 1.0