I've got a 3-column dataset with 7100 rows.
data.isna().sum()
shows, that one column contains 117 NaN vlaues, the others 0.
data.isnull().sum()
shows also 117 for one and 0 for the other columns.
data.dropna(inplace=True)
drops 351 rows. Can anyone explain this to me? Am I doing anything wrong?
Edit:
I now examined the deleted rows. There are 351 rows deleted, where dropped.isna().sum().sum()
shows a total of 117 NaN values.
dropped[~dropped['description'].isna()]
shows an empty table. So the result seems to be correct as far as I can see.
Now I'm just curious how the difference in counting occurs.
Sadly I'm not able/allowed to provide a data sample.
CodePudding user response:
data.isna().sum() returns the total number of NAN values in your dataframe and using data.dropna() will drop all NAN values. You can specifically check the number of NAN values by creating a subset for example: nan_rows=dataframe[dataframe.columnNameWithNanValues.isna()] to check for the NAN values and then return the shape of your dataframe. Next, use .dropna() without the inplace=True argument to drop NAN and Null values.
CodePudding user response:
Found the solution. pretty simple...
I've got three columns, one column contains 117 NaN values. 117 values for 3 columns are a total of 351 fields to be deleted. Since i used the df.size
to measure the deleted size, which counts fields and not rows, I got 351 "deleted fields", which is totally correct.