Home > Back-end >  Missing rate per column per month in pandas Dataframe
Missing rate per column per month in pandas Dataframe

Time:11-18

Let's take the following pd.DataFrame

>>> df = pd.DataFrame({
    'M'   : ['1', '1' , '3',  '6',  '6',  '6'],
    'col1': [None, 0.1, None, 0.2,  0.3,  0.4],
    'col2': [0.01, 0.1, 1.3,  None, None, 0.5]})

which creates

    M   col1  col2
0   1   NaN   0.01
1   1   0.1   0.10
2   3   NaN   1.30
3   6   0.2   NaN
4   6   0.3   NaN
5   6   0.4   0.50

I would now like to have the missing rate percentage per month per column. The resulting table should look like this

M   col1  col2
1   50.0   0.0
3  100.0   0.0
6    0.0  66.6

where the values in the cells in col1 and col2 state the missing rates per month for the column.

How can I do this?

CodePudding user response:

You can use a groupby.mean on the boolean columns:

out = (df.drop(columns='M').isna()  # check if the value is missing
         .groupby(df['M'])          # for each M
         .mean().mul(100).round(2)  # get the proportion x 100
         .reset_index()             # index as column
      )

output:

   M   col1   col2
0  1   50.0   0.00
1  3  100.0   0.00
2  6    0.0  66.67

CodePudding user response:

Code

df.set_index('M').isna().mean(level=0).mul(100).reset_index()

Result

   M   col1   col2
0  1   50.0   0.0
1  3  100.0   0.0
2  6    0.0   66.7
  • Related