Home > Software design >  pandas dropna drops more than isna counts
pandas dropna drops more than isna counts

Time:03-04

I've got a 3-column dataset with 7100 rows.

data.isna().sum() shows, that one column contains 117 NaN vlaues, the others 0. data.isnull().sum() shows also 117 for one and 0 for the other columns.

data.dropna(inplace=True) drops 351 rows. Can anyone explain this to me? Am I doing anything wrong?

Edit:

I now examined the deleted rows. There are 351 rows deleted, where dropped.isna().sum().sum() shows a total of 117 NaN values.

dropped[~dropped['description'].isna()] shows an empty table. So the result seems to be correct as far as I can see.

Now I'm just curious how the difference in counting occurs.

Sadly I'm not able/allowed to provide a data sample.

CodePudding user response:

data.isna().sum() returns the total number of NAN values in your dataframe and using data.dropna() will drop all NAN values. You can specifically check the number of NAN values by creating a subset for example: nan_rows=dataframe[dataframe.columnNameWithNanValues.isna()] to check for the NAN values and then return the shape of your dataframe. Next, use .dropna() without the inplace=True argument to drop NAN and Null values.

CodePudding user response:

Found the solution. pretty simple... I've got three columns, one column contains 117 NaN values. 117 values for 3 columns are a total of 351 fields to be deleted. Since i used the df.size to measure the deleted size, which counts fields and not rows, I got 351 "deleted fields", which is totally correct.

  • Related