Home > OS >  Python - Pandas - DROPNA(subset) deleting value for no apparent reasons?
Python - Pandas - DROPNA(subset) deleting value for no apparent reasons?

Time:11-28

I'm cleaning some data and I've been struggling with one thing.

I have a dataframe with 7740 rows and 68 columns.

Most of the columns contains Nan values.

What i'm interested in, is to remove NaN values when it is NaN in those two columns : [SERIAL_ID],[NUMBER_ID]

Example :

SERIAL_ID NUMBER_ID
8RY68U4R NaN
8756ERT5 8759321
NaN NaN
NaN 7896521
7EY68U4R NaN
95856ERT5 988888
NaN NaN
NaN 4555555

Results

SERIAL_ID NUMBER_ID
8RY68U4R NaN
8756ERT5 8759321
NaN 7896521
7EY68U4R NaN
95856ERT5 988888
NaN 4555555

Removing rows when NaN is in the two columns.

I've used the followings to do so :

df.dropna(subset=['SERIAL_ID', 'NUMBER_ID'], how='all', inplace=True)

When I use this on my dataframe with 68 columns the result I get is this one :

SERIAL_ID NUMBER_ID
NaN NaN
NaN NaN
NaN NaN
NaN 7896521
NaN NaN
95856ERT5 NaN
NaN NaN
NaN 4555555

I tried with a copy of the dataframe with only 3 columns, it is working fine.

It is somehow working (I can tel cause I have an identical ID in another column) but remove some of the value, and I have no idea why.

Please help I've been struggling the whole day with this. Thanks again.

CodePudding user response:

I don't know why it only works for 3 columns and not for 68 originals. However, we can obtain desired output in other way.

use boolean indexing:

df[df[['SERIAL_ID', 'NUMBER_ID']].notnull().any(axis=1)]

CodePudding user response:

You can use boolean logic or simple do something like this for any given column:

import numpy as np
import pandas as pd

# sample dataframe
d = {'SERIAL_ID':['8RY68U4R', '8756ERT5', np.nan, np.nan],
    'NUMBER_ID':[np.nan, 8759321, np.nan ,7896521]}
df = pd.DataFrame(d)

# apply logic to columns
df['nans'] = df['NUMBER_ID'].isnull() * df['SERIAL_ID'].isnull()

# filter columns
df_filtered = df[df['nans']==False]
print(df_filtered)

which returns this:

  SERIAL_ID  NUMBER_ID   nans
0  8RY68U4R        NaN  False
1  8756ERT5  8759321.0  False
3       NaN  7896521.0  False
  • Related