I'm trying to merge 2 or more dataframes on different columns in one pass. For example let's assume we have 3 datasets:
df1
id col_a col_b col_c ...
1 3 0 1
2 4 1 1
3 3 1 2
4 0 0 0
5 1 2 0
6 1 2 0
.
.
.
df2
id col_d id_2
1 3 a
2 4 b
3 3 c
3 0 d
4 1 e
5 1 f
.
.
.
df3
id_2 col_3
a 3
b 4
c 3
d 0
e 1
f 1
.
.
.
I want to chain connect the 3 data frames on columns id
between df1 and df2 and then connect that to df3 on id_2
. I know how to create a new dataframe after the first merge and then merge the result with df3, but is there a way to chain this in one function?
CodePudding user response:
Chain the operation. Here I use outer. You can change to what is most preferable now that you did not state.
df1.merge(df2, how='outer', on='id').merge(df3, how='outer', on='id_2')
id col_a col_b col_c col_d id_2 col_3
0 1 3 0 1 3.0 a 3.0
1 2 4 1 1 4.0 b 4.0
2 3 3 1 2 3.0 c 3.0
3 3 3 1 2 0.0 d 0.0
4 4 0 0 0 1.0 e 1.0
5 5 1 2 0 1.0 f 1.0
6 6 1 2 0 NaN NaN NaN