Home > Software engineering >  Pandas Merge dataframe with multiple columns based on condition
Pandas Merge dataframe with multiple columns based on condition

Time:05-05

Given df1 and df2:

df1 = pd.DataFrame({
    'Key': ['k1', 'k1', 'k1', 'k2', 'k3'], 
    'Num': [1, 2, 3, 1, 2],
    'A': ['a1', 'a2', 'a3', 'a4', 'a5']
})

display(df1)

df2 = pd.DataFrame({
    'Key': ['k1', 'k1', 'k2', 'k3'], 
    'Num': [1, 2, 1, 1],
    'X': ['x1', 'x2', 'x3', 'x4']
})

display(df2)

df1:
Key Num A
0   k1  1   a1
1   k1  2   a2
2   k1  3   a3
3   k2  1   a4
4   k3  2   a5

df2:
Key Num X
0   k1  1   x1
1   k1  2   x2
2   k2  1   x3
3   k3  1   x4

Expected Output:

Key Num A   X
0   k1  1   a1  x1
1   k1  2   a2  x2
2   k1  3   a3  x1
3   k2  1   a4  x3
4   k3  2   a5  x4

I would like to merge df2 into df1 on columns 'Key' and 'Num' such that if Num doesn't match, then the value with same key and num 1 from df2 will be matched if available.

CodePudding user response:

IIUC, you can merge then fillna with another merge (here as map):

s = df1['Key'].map(df2.drop_duplicates('Key').set_index('Key')['X'])

df3 = (df1
 .merge(df2, on=['Key', 'Num'], how='left')
 .fillna({'X': s})   
)

output:

  Key  Num   A   X
0  k1    1  a1  x1
1  k1    2  a2  x2
2  k1    3  a3  x1
3  k2    1  a4  x3
4  k3    2  a5  x4

CodePudding user response:

  • Right merge df2 to df1, then merge that to df2 by just Key where Num == 1.
  • Fill in the missing values in X_x with the X_y values.
  • Drop excess columns and restore naming:
df3 = df2.merge(df1, how='right').merge(df2[df2.Num==1], on='Key')
df3['X_x'] = df3[['X_x', 'X_y']].bfill(axis=1)['X_x']
df3.drop(['Num_y', 'X_y'], axis=1, inplace=True)
df3.columns = ['Key', 'Num', 'X', 'A']
display(df3)

Output:

  Key  Num   X   A
0  k1    1  x1  a1
1  k1    2  x2  a2
2  k1    3  x1  a3
3  k2    1  x3  a4
4  k3    2  x4  a5
  • Related