I have two dataframes:
df1 = pd.DataFrame({'id1':[3,2,1,4,5],'id2':[8,7,6,9,10]})
df2 = pd.DataFrame({'id3':[6,2,8,5,4],'Name':['a','b','c','e','d']})
df1
id1 id2
3 8
2 7
1 6
4 9
5 10
df2
id3 Name
6 a
2 b
8 c
5 e
4 d
Description
df1 consists of 2 columns (id1 and id2) which represents unique values for both the columns.
df2 consists of 2 columns (id3 and name) which also represents unique values.
id3 column of df2 is made up of either of values from df1 rows and its order is also random. Example - Lets take the first value of id3 column which is 6. It is made by picking a random value the third row of df1. Similarly 2 is made from picking random value from second row of df1. and so on..
Now I want to merge these 2 dataframes by id3 of df2 and either of id1 or id2 of df1 based on availability on id3 column. So, the final dataframe would look like this:
id1 id2 id3 Name
3 8 8 c
2 7 2 b
1 6 6 a
4 9 4 d
5 10 5 e
Can anyone help me on this?
CodePudding user response:
You can merge 2 times: first on id1/id3 and next on id2/id3 then join the 2 dataframes:
Update
I want to restore the order of values of df1 in out dataframe as well.
out = pd.concat([df1.reset_index().merge(df2, left_on='id1', right_on='id3'),
df1.reset_index().merge(df2, left_on='id2', right_on='id3')]) \
.sort_values('index').drop(columns='index').reset_index(drop=True)
Output:
>>> out
id1 id2 id3 Name
0 1 6 6 a
1 2 7 2 b
2 3 8 8 c
3 4 9 4 d
4 5 10 5 e