I have a dataframe(df), like the following
Column1 Column2 ...
0 1 1
1 Null 1
.
.
I want to plot the count of null values in each column
Currently, I am doing
df.isnull().sum().plot.bar()
plt.show()
The problem with this is there are about 180 columns and most of them have 0 null values, I want to ignore such columns while plotting.
I tried the following which doesn't seem to work
df_null = df.loc[: ,df.isnull().sum() > 0]
df_null.plot()
CodePudding user response:
If u need NaN count in each column, that have NaN and get bar plot, the next code may help:
df.isna().sum()[df.isna().sum()>0].plot(kind='bar')
CodePudding user response:
Maybe try:
df_null = df.isnull.sum()
df_null[df_null > 0].to_frame('count').T.plot()
Or you mean:
df.T[df.isnull().sum() > 0].plot()
CodePudding user response:
Subset the columns with a null into a separate dataframe and then plot that df
-
df_with_nulls = df[[col for col in df.columns if df[col].isna().sum() > 0]]
CodePudding user response:
I would like to offer you my foolproof solution that I use in almost every notebook:
import matplotlib.pyplot as plt
def plot_nas(df: pd.DataFrame):
if df.isnull().sum().sum() != 0:
na_df = (df.isnull().sum() / len(df)) * 100
na_df = na_df.drop(na_df[na_df == 0].index).sort_values(ascending=False)
missing_data = pd.DataFrame({'Missing Ratio %' :na_df})
missing_data.plot(kind = "barh")
plt.show()
else:
print('No NAs found')
plot_nas(df)
You can change the size of the graph as following:
plot_width, plot_height = (16,18)
plt.rcParams['figure.figsize'] = (plot_width,plot_height)