I have a large datasets. I partitioned the data into training and test.
I found the missing values of the independent variable.
I want to calculate the number of columns that have the missing value. in this case, I should get 12 names. I was only able to sum the whole column
Here is my attempt:
finding_missing_values = data.train.isnull().sum()
finding_missing_values
finding_missing_values.sum()
is there a way I can count the number of column that has a missing value?
CodePudding user response:
You wrote
finding_missing_values.sum()
You were looking for
(finding_missing_values > 0).values.sum()
From .values
we get a numpy array.
The comparison gives us False / True values, which conveniently are treated as 0 / 1 by .sum()
CodePudding user response:
Take data list to and then count non zero values as follows.
finding_missing_values = (data.train.isnull().sum()).to_list()
number of missing value columns = sum(k>0 for k in finding_missing_values )
print(number of missing value columns)
should Give #
12