Home > Mobile >  Get proportion of missing values per Country
Get proportion of missing values per Country

Time:02-17

enter image description here

I would like to find the proportion of missing values of my features on each country and on all years to select the countries.

I tried this:

df[indicators].isna().mean().sort_values(ascending=False)

but it gives me the proportion of missing values for each indicator only...

i would like this output :

enter image description here

CodePudding user response:

You can use DataFrame.melt for reshape and then aggregate mean of missing values:

df1 = (df.melt(id_vars='Country Name', value_vars=indicators)
         .set_index('Country Name')['value'].isna()
         .groupby('Country Name')
         .mean()
         .reset_index(name='Prop'))

Or reshape by DataFrame.stack:

df1 = (df.set_index('Country Name')[indicators]
         .stack(dropna=False)
         .isna()
         .groupby('Country Name')
         .mean()
         .reset_index(name='Prop')
        )

Or use custom function:

df1 = (df.groupby('Country Name')[indicators]
         .apply(lambda x: np.mean(x.isna().to_numpy(), axis=None))
         .reset_index(name='Prop'))
                   
  • Related