Home > database >  How to merge two dataframes based on different columns in one dataframe?
How to merge two dataframes based on different columns in one dataframe?

Time:09-29

I have two dataframes:

df1 = pd.DataFrame({'id1':[3,2,1,4,5],'id2':[8,7,6,9,10]})
df2 = pd.DataFrame({'id3':[6,2,8,5,4],'Name':['a','b','c','e','d']})

df1

id1   id2
3     8
2     7
1     6
4     9
5     10

df2


id3   Name
6     a
2     b
8     c
5     e
4     d

Description

  • df1 consists of 2 columns (id1 and id2) which represents unique values for both the columns.

  • df2 consists of 2 columns (id3 and name) which also represents unique values.

  • id3 column of df2 is made up of either of values from df1 rows and its order is also random. Example - Lets take the first value of id3 column which is 6. It is made by picking a random value the third row of df1. Similarly 2 is made from picking random value from second row of df1. and so on..

Now I want to merge these 2 dataframes by id3 of df2 and either of id1 or id2 of df1 based on availability on id3 column. So, the final dataframe would look like this:

id1 id2 id3 Name
3   8   8   c
2   7   2   b
1   6   6   a
4   9   4   d
5   10  5   e

Can anyone help me on this?

CodePudding user response:

You can merge 2 times: first on id1/id3 and next on id2/id3 then join the 2 dataframes:

Update

I want to restore the order of values of df1 in out dataframe as well.

out = pd.concat([df1.reset_index().merge(df2, left_on='id1', right_on='id3'),
                 df1.reset_index().merge(df2, left_on='id2', right_on='id3')]) \
        .sort_values('index').drop(columns='index').reset_index(drop=True)

Output:

>>> out
   id1  id2  id3 Name
0    1    6    6    a
1    2    7    2    b
2    3    8    8    c
3    4    9    4    d
4    5   10    5    e
  • Related