I have a DataFrame, you can have it by running:
import pandas as pd
from io import StringIO
df = """
case_id scheduled_date status_code
1213 2021-08 success
3444 2021-06 fail
4566 2021-07 unknown
12213 2021-08 unknown
34344 2021-06 fail
44566 2021-07 unknown
1213 2021-08 fail
"""
df= pd.read_csv(StringIO(df.strip()), sep='\s\s ', engine='python')
This outputs:
case_id scheduled_date status_code
0 1213 2021-08 success
1 3444 2021-06 fail
2 4566 2021-07 unknown
3 12213 2021-08 unknown
4 34344 2021-06 fail
5 44566 2021-07 unknown
6 1213 2021-08 fail
How can I count the number of success, fail, and unknown of each month?
Output should look like:
scheduled_date num of success num of fail num of unknown
2021-08 1 1 1
2021-06 0 2 0
2021-07 0 0 2
CodePudding user response:
Here is a proposition with pandas.crosstab
:
out = (
pd.crosstab(df["scheduled_date"], df["status_code"])
.rename_axis(None, axis=1)
.add_prefix("num of ")
.sort_index(ascending=False)
.reset_index()
)
# Output :
print(out)
scheduled_date num of fail num of succuss num of unknown
0 2021-08 1 1 1
1 2021-07 0 0 2
2 2021-06 2 0 0
CodePudding user response:
You can use .pivot_table()
to create counts for each (month, status code) pair, and then use .fillna
to replace NaNs with zero counts:
df.pivot_table(index="scheduled_date", columns="status_code", aggfunc=len).fillna(0)
This outputs:
case_id
status_code fail success unknown
scheduled_date
2021-06 2.0 0.0 0.0
2021-07 0.0 0.0 2.0
2021-08 1.0 1.0 1.0