How to merge partial data into a given df-CodePudding

How can I merge partial data into a given df, without changing unknown values? Here is a minimal example:

>>> import pandas as pd
>>> import numpy as np
>>> data = {
...     "Feat_A": ["INVALID", "INVALID", "INVALID"],
...     "Feat_B": ["INVALID", "INVALID", "INVALID"],
...     "Key": [12, 25, 99],
... }
>>> 
>>> origin = pd.DataFrame(data=data)
>>> data = {"Feat_A": [1, np.nan], "Feat_B": [np.nan, 2], "Key": [12, 99]}
>>> new = pd.DataFrame(data=data)
>>> origin = origin.merge(
...     new[["Key", "Feat_A", "Feat_B"]],
...     on="Key",
...     how="left",
... )
>>> 
>>> origin
  Feat_A_x Feat_B_x  Key  Feat_A_y  Feat_B_y
0  INVALID  INVALID   12       1.0       NaN
1  INVALID  INVALID   25       NaN       NaN
2  INVALID  INVALID   99       NaN       2.0

This is what I am looking for:

#    Feat_A   Feat_B    Key  
# 0  1.0      INVALID   12
# 1  INVALID  INVALID   25   
# 2  INVALID  2.0       99

CodePudding user response：

First set "INVALID" cells to NAN then you can set key as index and use fillna method to fill the values from new to origin.

import numpy as np
origin = origin.map({'INVALID':np.nan})
originNew=origin.set_index('Key').fillna(new.set_index("Key"))

CodePudding user response：

Found pretty solution here: Overwrite columns in DataFrames of different sizes pandas

df_result = df2.set_index('ids').combine_first(df1.set_index('ids'))
df_result.reset_index()

Furthermore, df.update() seems to do something like that, but I did not figure out how to take care of different sizes.