I am using Python pandas, I would appreciate the code in Python. But you can suggest a solution in Excel or SQL. I have two tables A and B. Table A has two ID columns by which I want to join table B. Table B has one ID column that either corresponds with one or the other ID column of Table A. Here is an example:
And this is the resulting table AB that I want:
CodePudding user response:
in sql
:
select * from tableA a
left join tableB b
on b.ID in (a.ID1, a.ID2)
CodePudding user response:
You can use pd.merge()
dropna()
isna()
pd.concat()
import pandas as pd
m1 = pd.merge(df_A, df_B, left_on='ID Column 1', right_on='ID Column', how='left')
m1_mismatches = m1[m1['ID Column'].isna()].dropna(axis=1)
m1_matches = m1[~m1['ID Column'].isna()]
m2 = pd.merge(m1_mismatches, df_B, left_on='ID Column 2', right_on='ID Column', how='left')
df_AB = pd.concat([m1_matches, m2]).drop(columns='ID Column').reset_index(drop=True)
print(df_AB)
The idea here is to do an initial "left join" between df_A
and df_B
, and then a second "left join" with the mismatches found in the first join. Finally we use pd.concat
to concat the results.
Output:
ID Column 1 | ID Column 2 | col 1 | col 2 | col 3 | col 4 | |
---|---|---|---|---|---|---|
0 | [email protected] | [email protected] | lol | 9479 | pee pee poo poo | 985023.0 |
1 | [email protected] | [email protected] | bla bla bla | 123 | bla bla bla hello | 124323.0 |
2 | [email protected] | [email protected] | harry potter | 2077 | NaN | NaN |