Given df1 and df2:
df1 = pd.DataFrame({
'Key': ['k1', 'k1', 'k1', 'k2', 'k3'],
'Num': [1, 2, 3, 1, 2],
'A': ['a1', 'a2', 'a3', 'a4', 'a5']
})
display(df1)
df2 = pd.DataFrame({
'Key': ['k1', 'k1', 'k2', 'k3'],
'Num': [1, 2, 1, 1],
'X': ['x1', 'x2', 'x3', 'x4']
})
display(df2)
df1:
Key Num A
0 k1 1 a1
1 k1 2 a2
2 k1 3 a3
3 k2 1 a4
4 k3 2 a5
df2:
Key Num X
0 k1 1 x1
1 k1 2 x2
2 k2 1 x3
3 k3 1 x4
Expected Output:
Key Num A X
0 k1 1 a1 x1
1 k1 2 a2 x2
2 k1 3 a3 x1
3 k2 1 a4 x3
4 k3 2 a5 x4
I would like to merge df2 into df1 on columns 'Key' and 'Num' such that if Num doesn't match, then the value with same key and num 1 from df2 will be matched if available.
CodePudding user response:
IIUC, you can merge
then fillna
with another merge (here as map
):
s = df1['Key'].map(df2.drop_duplicates('Key').set_index('Key')['X'])
df3 = (df1
.merge(df2, on=['Key', 'Num'], how='left')
.fillna({'X': s})
)
output:
Key Num A X
0 k1 1 a1 x1
1 k1 2 a2 x2
2 k1 3 a3 x1
3 k2 1 a4 x3
4 k3 2 a5 x4
CodePudding user response:
- Right merge
df2
todf1
, then merge that todf2
by justKey
whereNum == 1
. - Fill in the missing values in
X_x
with theX_y
values. - Drop excess columns and restore naming:
df3 = df2.merge(df1, how='right').merge(df2[df2.Num==1], on='Key')
df3['X_x'] = df3[['X_x', 'X_y']].bfill(axis=1)['X_x']
df3.drop(['Num_y', 'X_y'], axis=1, inplace=True)
df3.columns = ['Key', 'Num', 'X', 'A']
display(df3)
Output:
Key Num X A
0 k1 1 x1 a1
1 k1 2 x2 a2
2 k1 3 x1 a3
3 k2 1 x3 a4
4 k3 2 x4 a5