I have two dataframes, df_1 and df_2, where df_1 has several columns of "codes" and df_2 has the definitions for all of those codes:
df_1 = pd.DataFrame({
'Age': [42, 35, 64, 53],
'Code 1': [1234, 3452, 9583, 8753],
'Code 2': [3857, np.nan, np.nan, 1234]})
df_2 = pd.DataFrame({
'Code': [3452, 8753, 3857, 1234, 9583],
'Code Def':['a', 'b', 'c', 'd', 'e']})
How do I create a new column in df_1 that contains the definitions of all the codes from df_2 to look something like this?
Age Code 1 Code 2 Code def
42 1234 3857 d, c
35 3452 NaN a
64 9583 NaN e
53 8753 1234 b, d
I've tried using merge()
to combine the two dataframes, but that doesn't work since I want to join on multiple columns in df_1 and just one column in df_2. I also tried creating empty columns in df_1 and filling them using if
statements, but that got quite complicated.
Thanks!
CodePudding user response:
You could first stack
and groupby
agg
to form the new column, then merge
with the original dataset:
s = df_2.set_index(['Code'])['Code Def']
df_1.merge(df_1.set_index('Age')
.stack().map(s)
.groupby(level='Age').agg(','.join)
.rename('Code def'),
left_on='Age', right_index=True
)
output:
Age Code 1 Code 2 Code def
0 42 1234 3857.0 d,c
1 35 3452 NaN a
2 64 9583 NaN e
3 53 8753 1234.0 b,d
CodePudding user response:
Here anther approach
x= dict(zip(df_2["Code"], df_2['Code Def']))
tmp = df_1[["Code 1","Code 2"]].replace({"Code 1":x,'Code 2':x}).fillna('')
df_1["Code Def"] = tmp["Code 1"] " " tmp["Code 2"]
output
Age Code 1 Code 2 Code Def
0 42 1234 3857.0 d c
1 35 3452 NaN a
2 64 9583 NaN e
3 53 8753 1234.0 b d