Let's take the following pd.DataFrame
>>> df = pd.DataFrame({
'M' : ['1', '1' , '3', '6', '6', '6'],
'col1': [None, 0.1, None, 0.2, 0.3, 0.4],
'col2': [0.01, 0.1, 1.3, None, None, 0.5]})
which creates
M col1 col2
0 1 NaN 0.01
1 1 0.1 0.10
2 3 NaN 1.30
3 6 0.2 NaN
4 6 0.3 NaN
5 6 0.4 0.50
I would now like to have the missing rate percentage per month per column. The resulting table should look like this
M col1 col2
1 50.0 0.0
3 100.0 0.0
6 0.0 66.6
where the values in the cells in col1
and col2
state the missing rates per month for the column.
How can I do this?
CodePudding user response:
You can use a groupby.mean
on the boolean columns:
out = (df.drop(columns='M').isna() # check if the value is missing
.groupby(df['M']) # for each M
.mean().mul(100).round(2) # get the proportion x 100
.reset_index() # index as column
)
output:
M col1 col2
0 1 50.0 0.00
1 3 100.0 0.00
2 6 0.0 66.67
CodePudding user response:
Code
df.set_index('M').isna().mean(level=0).mul(100).reset_index()
Result
M col1 col2
0 1 50.0 0.0
1 3 100.0 0.0
2 6 0.0 66.7