I have 2 DataFrames, one is called old
and another is called new
.
The 2 DataFrames have multiple columns, but I am interested in column called ADDTEXT
. When you open the 2 files in Excel and compare the ADDTEXT
columns, they are completely identical.
When I do old == new
in Python, it returns False
. When I do new['ADDTEXT'].equals(old['ADDTEXT'])
it returns True
.
Why don't they both return True since both columns contain only the NaN
values in them?
Example output:
>>> new = pd.read_excel('3.8_self_input_data.xlsx')
>>>
>>>
>>> old = pd.read_excel('3.7_self_input_data.xlsx')
>>>
>>> old['ADDTEXT']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
13630 NaN
13631 NaN
13632 NaN
13633 NaN
13634 NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
13630 NaN
13631 NaN
13632 NaN
13633 NaN
13634 NaN
Name: ADDTEXT, Length: 13635, dtype: object
>>>
>>> new['ADDTEXT'] == old['ADDTEXT']
0 False
1 False
2 False
3 False
4 False
...
13630 False
13631 False
13632 False
13633 False
13634 False
Name: ADDTEXT, Length: 13635, dtype: bool
>>>
>>> new['ADDTEXT'].equals(old['ADDTEXT'])
True
CodePudding user response:
NaN != NaN
Instead of just using .equals()
, you can use isna()
on the two columns:
(new['ADDTEXT'].eq(old['ADDTEXT']) | (new['ADDTEXT'].isna() & old['ADDTEXT'].isna()))
Basically that reads: return True for each item if both items are equal or both are NaN.