Home > Enterprise >  How to plot count of null values for each column in Pandas Dataframe
How to plot count of null values for each column in Pandas Dataframe

Time:10-12

I have a dataframe(df), like the following

    Column1 Column2 ...
0      1       1    
1    Null      1
.
.

I want to plot the count of null values in each column

Currently, I am doing

    df.isnull().sum().plot.bar()
    plt.show()

Count of Null Values in each column

The problem with this is there are about 180 columns and most of them have 0 null values, I want to ignore such columns while plotting.

I tried the following which doesn't seem to work

    df_null = df.loc[: ,df.isnull().sum() > 0]
    df_null.plot()

New Plot

CodePudding user response:

If u need NaN count in each column, that have NaN and get bar plot, the next code may help:

df.isna().sum()[df.isna().sum()>0].plot(kind='bar')

CodePudding user response:

Maybe try:

df_null = df.isnull.sum()
df_null[df_null > 0].to_frame('count').T.plot()

Or you mean:

df.T[df.isnull().sum() > 0].plot()

CodePudding user response:

Subset the columns with a null into a separate dataframe and then plot that df -

df_with_nulls = df[[col for col in df.columns if df[col].isna().sum() > 0]]

CodePudding user response:

I would like to offer you my foolproof solution that I use in almost every notebook:

import matplotlib.pyplot as plt
def plot_nas(df: pd.DataFrame):
    if df.isnull().sum().sum() != 0:
        na_df = (df.isnull().sum() / len(df)) * 100      
        na_df = na_df.drop(na_df[na_df == 0].index).sort_values(ascending=False)
        missing_data = pd.DataFrame({'Missing Ratio %' :na_df})
        missing_data.plot(kind = "barh")
        plt.show()
    else:
        print('No NAs found')
plot_nas(df)

solution

You can change the size of the graph as following:

plot_width, plot_height = (16,18)
plt.rcParams['figure.figsize'] = (plot_width,plot_height)
  • Related