I have two dataframes which I need to merge/lookup values:
df1
Key | Value |
---|---|
123 | |
456 | |
789 | |
101 | 0.3 |
df2
Key | Value |
---|---|
123 | 0.1 |
456 | 0.2 |
key = distinct value. I need to look up df2 based on the Key column.
Expected result
Key | Value |
---|---|
123 | 0.1 |
456 | 0.2 |
789 | |
101 | 0.3 |
I have tried following:
merged_df = pd.merge(
left = df1,
right = df2,
on = 'Key',
how = 'outer',
)
This creates an additional column 'Value_y' in merged df, while I need to get the data in the original df1 'Value' column.
How could I tweak the code, or perhaps there is another function for this task? Also currently 'Key' value is 3 combined values (product code and two dates), perhaps it's possible to merge on several columns, instead of creating an additional 'Key' column for merging?
CodePudding user response:
Concatenate the two dataframes, then groupby Key
and call first
on the DataFrame Groupby object:
>>> pd.concat([df1, df2]).groupby('Key', sort=False).first().reset_index()
Key Value
0 123 0.1
1 456 0.2
2 789 NaN
3 101 0.3
CodePudding user response:
here is one way to do it using mask and map
map the key in the reference DF, and if value is not null, assign it to value in first DF
(df['Value']=df['Value'].mask(
(df['Key'].map(df2.set_index('Key').Value)).notna(),
df['Key'].map(df2.set_index('Key').Value)
))
df
OR using loc
df.loc[df['Value'].isna(), 'Value'] = df['Key'].map(df2.set_index('Key').Value)
df
Key Value
0 123 0.1
1 456 0.2
2 789 NaN
3 101 0.3
CodePudding user response:
Try with update
df1.set_index('Key', inplace=True)
df1.update(df2.set_index('Key'))
df1.reset_index()