I would like to find the proportion of missing values of my features on each country and on all years to select the countries.
I tried this:
df[indicators].isna().mean().sort_values(ascending=False)
but it gives me the proportion of missing values for each indicator only...
i would like this output :
CodePudding user response:
You can use DataFrame.melt
for reshape and then aggregate mean
of missing values:
df1 = (df.melt(id_vars='Country Name', value_vars=indicators)
.set_index('Country Name')['value'].isna()
.groupby('Country Name')
.mean()
.reset_index(name='Prop'))
Or reshape by DataFrame.stack
:
df1 = (df.set_index('Country Name')[indicators]
.stack(dropna=False)
.isna()
.groupby('Country Name')
.mean()
.reset_index(name='Prop')
)
Or use custom function:
df1 = (df.groupby('Country Name')[indicators]
.apply(lambda x: np.mean(x.isna().to_numpy(), axis=None))
.reset_index(name='Prop'))