I have a ref table df_ref
like this:
col1 col2 ref
a b a,b
c d c,d
I need to create a new column in another table based on ref table.The table like this:
col1 col2
a b
a NULL
NULL b
a NULL
a NULL
c d
c NULL
NULL NULL
The output table df_org
looks like:
col1 col2 ref
a b a,b
a NULL a,b
NULL b a,b
a NULL a,b
a NULL a,b
c d c,d
c NULL c,d
NULL NULL NULL
If any column value in col1
and col2
can find in ref table, it will use the ref col in ref table. If col1
and col2
are NULL
, So they cannot find anything in ref table, just return NULL
. I use this code, but it doesn't work.
df_org['ref']=np.where(((df_org['col1'].isin(df_ref['ref'])) |
(df_org['col2'].isin(df_ref['ref']))
), df_ref['ref'], 'NULL')
ValueError: operands could not be broadcast together with shapes
CodePudding user response:
You want to perform two merges and combine them:
df_org = (
df.merge(df_ref.drop('col2', axis=1), on='col1', how='left')
.combine_first(df.merge(df_ref.drop('col1', axis=1), on='col2', how='left'))
)
output:
col1 col2 ref
0 a b a,b
1 a NaN a,b
2 NaN b a,b
3 a NaN a,b
4 a NaN a,b
5 c d c,d
6 c NaN c,d
7 NaN NaN NaN
CodePudding user response:
( df.merge(df_ref[['col1', 'ref']], how="outer", left_on='col1', right_on='col1') # add column for col1 refs
.merge(df_ref[['col2', 'ref']], how="outer", left_on='col2', right_on='col2', # add column for col2 refs
suffixes=('_col1', '_col2')) # set suffixes to both ref columns
.assign(ref=lambda x: x['ref_col1'].fillna(x['ref_col2'])) # add column from 'ref_col1' and fill 'NaN' from 'ref_col2'
.drop(['ref_col1', 'ref_col2'], axis=1) # drop 'ref_col1' and 'ref_col2' columns
)
results in
col1 col2 ref
0 a b a,b
1 NaN b a,b
2 a NaN a,b
3 a NaN a,b
4 a NaN a,b
5 NaN NaN NaN
6 c NaN c,d
7 c d c,d