When updating one DataFrame based on info from second one the NaN's are not transferred. With below code
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A':[1,2,3,4,5], 'B':[4,5,6,7,8]})
df2 = pd.DataFrame({'A':[1,2,3], 'B':[7,np.nan,6]})
df1.update(df2)
The new DataFrame will look as below:
So the value from df2 is not transferred as this is a NaN
Normally this is probably an expected outcome but not in the particular case that I'm working on. For the moment how I deal with it is that I replace all NaNs with a value or string that signalizes to me that it's something I need to check but I was wondering if there is an actual solution to pass the NaN as an update?
CodePudding user response:
What you could do is a bit more involved, but solves the problem as well without using fillna
or so:
df3 = df1.merge(df2, how='left', suffixes=('___',''), on=['A'], indicator=True)
df3['finalB'] = df3.apply(lambda x: x['B'] if x['_merge'] == 'both' else x['B___'], axis=1)
df3[['A','finalB']]
Output:
A finalB
0 1.0 7.0
1 2.0 NaN
2 3.0 6.0
3 4.0 7.0
4 5.0 8.0
CodePudding user response:
Concat dataframes then remove duplicated index:
>>> pd.concat([df2, df1]).loc[lambda x: ~x.index.duplicated()].sort_index()
A B
0 1 7.0
1 2 NaN
2 3 6.0
3 4 7.0
4 5 8.0