Home > Software design >  Merge on Nan - The bug is the behavior I want. Should I worry about future correction?
Merge on Nan - The bug is the behavior I want. Should I worry about future correction?

Time:02-19

In Pandas, pd.Nan != pd.Nan, Yet for now, merging to dataframe, the Nan will be merge together.

As reported in the question Why does pandas merge on NaN?, the normal behavior should be to not merge on that. The question is discussed on the Pandas issue tracker.

From It_is_chris:

# merge example
df = pd.DataFrame({'col1':[np.nan, 'match'], 'col2':[1,2]})
df2 = pd.DataFrame({'col1':[np.nan, 'no match'], 'col3':[3,4]})
pd.merge(df,df2, on='col1')

    col1    col2    col3
0   NaN      1       3

Now that we know that, In my code, I need to merge on the Nan as well. I could use the glitch in Pandas, but In the future, could the behavior change and then break my code?

What is the best option to prevent that?

Thanks

CodePudding user response:

As you correctly pointed out, in future, there is a possibility of not being able to join on NaN. Depending on the programming language, this behavior changes.

The easiest future-proof solution would be to replace NaN with "NA" or a similar string. You may replace it back to to NaN post merging if required.

df = pd.DataFrame({'col1':[np.nan, 'match'], 'col2':[1,2]}).fillna("NA")
df2 = pd.DataFrame({'col1':[np.nan, 'no match'], 'col3':[3,4]}).fillna("NA")
pd.merge(df,df2, on='col1')
  • Related