Home > Enterprise >  Pandas Update on DataFrame to carryover NaN from second DataFrame
Pandas Update on DataFrame to carryover NaN from second DataFrame

Time:02-25

When updating one DataFrame based on info from second one the NaN's are not transferred. With below code

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A':[1,2,3,4,5], 'B':[4,5,6,7,8]})
df2 = pd.DataFrame({'A':[1,2,3], 'B':[7,np.nan,6]})
df1.update(df2)

The new DataFrame will look as below:

enter image description here

So the value from df2 is not transferred as this is a NaN

enter image description here

Normally this is probably an expected outcome but not in the particular case that I'm working on. For the moment how I deal with it is that I replace all NaNs with a value or string that signalizes to me that it's something I need to check but I was wondering if there is an actual solution to pass the NaN as an update?

enter image description here

CodePudding user response:

What you could do is a bit more involved, but solves the problem as well without using fillna or so:

df3 = df1.merge(df2, how='left', suffixes=('___',''), on=['A'], indicator=True)
df3['finalB'] = df3.apply(lambda x: x['B'] if x['_merge'] == 'both' else x['B___'], axis=1)
df3[['A','finalB']]

Output:

     A  finalB
0  1.0     7.0
1  2.0     NaN
2  3.0     6.0
3  4.0     7.0
4  5.0     8.0

CodePudding user response:

Concat dataframes then remove duplicated index:

>>> pd.concat([df2, df1]).loc[lambda x: ~x.index.duplicated()].sort_index()
   A    B
0  1  7.0
1  2  NaN
2  3  6.0
3  4  7.0
4  5  8.0
  • Related