Home > Mobile >  How to sum up missing values per row in pandas dataframe
How to sum up missing values per row in pandas dataframe

Time:03-08

I have a dataframe:

import pandas as pd

d = {
'Country': ["Austria", "Austria", "Belgium", "USA", "USA", "USA", "USA"], 
'Number2020': [15, None, 18, 20, 22, None,  30],
'Number2021': [20, 25, 18, None, None, None, 32],    
}

df = pd.DataFrame(data=d)
df

    Country   Number2020    Number2021
0   Austria   15.0          20.0
1   Austria   NaN           25.0
2   Belgium   18.0          18.0
3   USA       20.0          NaN
4   USA       22.0          NaN
5   USA       NaN           NaN
6   USA       30.0          32.0

and I want to sum up the nan values per each country. E.g.

    Country       Count_nans
    Austria       1
    
    USA           4

I have filtered the dataframe to leave only the rows with nans .

df_nan = df[df.Number2021.isna() | df.Number2020.isna()]

    Country   Number2020    Number2021
1   Austria   NaN           25.0
3   USA       20.0          NaN
4   USA       22.0          NaN
5   USA       NaN           NaN

So it looks like a groupby operation? I have tried this.

nasum2021 = df_nan['Number2021'].isna().sum()

df_nan['countNames2021'] = df_nan.groupby(['Number2021'])['Number2021'].transform('count').fillna(nasum2021)
df_nan

It gives me 1 nan for Austria but 3 for the United States while it should be 4. so that is not right. In my real dataframe, I have some 10 years and around 30 countries. thank you!

CodePudding user response:

Solution for processing all columns without Country - first convert it to index, test missing values and aggregate sum, last sum columns:

s = df.set_index('Country').isna().groupby('Country').sum().sum(axis=1)
print (s)
Country
Austria    1
Belgium    0
USA        4
dtype: int64

If need remove 0 values add enter image description here

  • Related