Here is the situation: I have two pandas data frames:
TABLE 1:
name | alias | col3 |
---|---|---|
str | str | str |
TABLE 2:
name_or_alias | col2 |
---|---|
str | str |
- table1.name and table1.alias all contain unique values. Meaning, there are no duplicates between either of the two columns.
I need to do a left join on table2, but the problem is that the column to join on may be either table1.name OR table1.alias.
So, if I do:
table2.merge(table2, how=left, on=name)
,
I will only get some of the matches. If I do:
table2.merge(table2, how=left, on=alias)
,
I will also only get some of the matches. I need to figure out how to do a sort of IF statement where I first check one column for a match and then check the other column. I tried looking for ways to merge on two separate columns in pandas but I cannot find any.
CodePudding user response:
Use two merge
for each column then concat
the two output dataframes and finally remove duplicated
index:
out = pd.concat([df1.merge(df2, how='left', left_on='name', right_on='name_or_alias'),
df1.merge(df2, how='left', left_on='alias', right_on='name_or_alias')],
axis=0).pipe(lambda x: x[x.index.duplicated()])
print(out)
# Output
name alias col3 name_or_alias col2
0 str str str str str