Home > Mobile >  Python - Pandas compare columns with NaN returns False
Python - Pandas compare columns with NaN returns False

Time:12-15

I have 2 DataFrames, one is called old and another is called new. The 2 DataFrames have multiple columns, but I am interested in column called ADDTEXT. When you open the 2 files in Excel and compare the ADDTEXT columns, they are completely identical. When I do old == new in Python, it returns False. When I do new['ADDTEXT'].equals(old['ADDTEXT']) it returns True.

Why don't they both return True since both columns contain only the NaN values in them?

Example output:

>>> new = pd.read_excel('3.8_self_input_data.xlsx')
>>>
>>>
>>> old = pd.read_excel('3.7_self_input_data.xlsx')
>>>
>>> old['ADDTEXT']
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ...
13630    NaN
13631    NaN
13632    NaN
13633    NaN
13634    NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT']
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ...
13630    NaN
13631    NaN
13632    NaN
13633    NaN
13634    NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT'] == old['ADDTEXT']
0        False
1        False
2        False
3        False
4        False
         ...
13630    False
13631    False
13632    False
13633    False
13634    False
Name: ADDTEXT, Length: 13635, dtype: bool
>>>
>>> new['ADDTEXT'].equals(old['ADDTEXT'])
True

CodePudding user response:

NaN != NaN

Instead of just using .equals(), you can use isna() on the two columns:

(new['ADDTEXT'].eq(old['ADDTEXT']) | (new['ADDTEXT'].isna() & old['ADDTEXT'].isna()))

Basically that reads: return True for each item if both items are equal or both are NaN.

  • Related