How to check missing values in each row of dataframe-CodePudding

I have a dataframe with 100's of columns and millions of rows and would like to check the missing values in each row of dataframe.

Code :

df.isna().sum()

Currently, i'm analzing with above code which helps me with missing values in each column. How we can get the missing values w.r.t each row.

Also, distribution plot of [column of rows] vs [number of missing values].

CodePudding user response：

You can try in a first time to do :

df_nan=pd.DataFrame(df.isna().mean().reset_index()).rename(columns={"index": "columns", 0: "nan_pourcentage"}).sort_values(by='nan_pourcentage',ascending=False)

Just so you can you understand which columns has the most or the less NaN, and you can plot it

You can know the % total of Nan in your dataframe using : df.isna().mean().mean()

And now if you want the % of NaN per line :

for index in range(len(df.index)) :
    print("Nan in row ", index , " : " ,  df.iloc[index].isna().mean())

Instead of using a print you can store the result in a dataframe

CodePudding user response：

How we can get the missing values w.r.t each row.

You can try sum on columns

df.isna().sum(axis=1)

distribution plot of [column of rows] vs [number of missing values].

If you mean number of missing values in each columns, df.isna().sum() already gives the result.