I have a dataframe with 100's of columns and millions of rows and would like to check the missing values in each row of dataframe.
Code :
df.isna().sum()
Currently, i'm analzing with above code which helps me with missing values in each column. How we can get the missing values w.r.t each row.
Also, distribution plot of [column of rows] vs [number of missing values].
CodePudding user response:
You can try in a first time to do :
df_nan=pd.DataFrame(df.isna().mean().reset_index()).rename(columns={"index": "columns", 0: "nan_pourcentage"}).sort_values(by='nan_pourcentage',ascending=False)
Just so you can you understand which columns has the most or the less NaN, and you can plot it
You can know the % total of Nan in your dataframe using : df.isna().mean().mean()
And now if you want the % of NaN per line :
for index in range(len(df.index)) :
print("Nan in row ", index , " : " , df.iloc[index].isna().mean())
Instead of using a print you can store the result in a dataframe
CodePudding user response:
How we can get the missing values w.r.t each row.
You can try sum
on columns
df.isna().sum(axis=1)
distribution plot of [column of rows] vs [number of missing values].
If you mean number of missing values in each columns, df.isna().sum()
already gives the result.