I have a dataframe A with columns like so:
ACCOUNT_NAME | SFDC_ACCOUNT_NAME | COMPANY_NAME |
---|---|---|
Acme Inc | Acme, Inc. | Acme |
Donut Heaven | None | Doughnut Heaven |
Super Foods | Sooper Foods | None |
I want to merge on additional columns but I am not sure if this additional data was captured using ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
. This data looks like the table below. There is a join key column that could represent either ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
.
CAPTURED_COMPANY_NAME | value1 | value2 |
---|---|---|
Acme Inc | 2 | 3 |
Sooper Foods | 6 | 7 |
Doughnut Heaven | 5 | 8 |
I want the final table to look like this:
ACCOUNT_NAME | SFDC_ACCOUNT_NAME | COMPANY_NAME | value1 | value2 |
---|---|---|---|---|
Acme Inc | Acme, Inc. | Acme | 2 | 3 |
Donut Heaven | None | Doughnut Heaven | 5 | 8 |
Super Foods | Sooper Foods | None | 6 | 7 |
I could merge this second dataset onto the first three times (one for each join key) but then of course columns value1
and value2
would be repeated three times. What is the best way to achieve this?
CodePudding user response:
Given:
# df
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME
0 Acme Inc Acme, Inc. Acme
1 Donut Heaven NaN Doughnut Heaven
2 Super Foods Sooper Foods NaN
# df1
CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc 2 3
1 Sooper Foods 6 7
2 Doughnut Heaven 5 8
Doing:
# Merge each individually, and concat the results:
out = pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True)
# Left Merge:
# out = df.merge(pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True), how='left')
print(out)
Output:
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc Acme, Inc. Acme Acme Inc 2 3
1 Super Foods Sooper Foods NaN Sooper Foods 6 7
2 Donut Heaven NaN Doughnut Heaven Doughnut Heaven 5 8