I have 2 large pandas dfs, where the first one contains IDs where some are updated and others are not. I want to merge the dfs such that it will include the corresponding old/new ID as a comma separated list.
DF1:
ID name
nfi23 sally
arb128 joe
mbi13 mary
DF2
ID_old ID_updated
nfi23 wjm348
hji21 arb128
mbi13 ybm328
desired:
ID name
nfi23, wjm348 sally
hji21, arb128 joe
mbi13, ybm328 mary
CodePudding user response:
here is one way to do it
#combine the old and updated id and create new column
df2['combined'] = df2['ID_old'] "," df2['ID_updated']
# melt, to flatten the DF
df3=df2.melt('combined', value_name='ID')
#finally, merge the DF and the melted DF2 (as DF3)
df4=df.merge(df3,
on='ID',
how='left').drop(columns=['variable','ID'])
df4
OR make use the MAP
#combine the old and updated id and create new column
df2['combined'] = df2['ID_old'] "," df2['ID_updated']
# melt, to flatten the DF
df3=df2.melt('combined', value_name='ID')
#finally, use map in mapping the value
df['combined']=df['ID'].map(df3.set_index('ID')['combined'])
df.drop(columns='ID')
name combined
0 sally nfi23,wjm348
1 joe hji21,arb128
2 mary mbi13,ybm328
CodePudding user response:
You can set ID_old
as an index in df2
and then map it to df1
while combining them:
mapper = df2.set_index("ID_old")["ID_updated"]
df1['ID'] = df1['ID'] ", " df1['ID'].map(mapper)
df1 Ouput:
ID name
0 nfi23, wjm348 sally
1 arb128, arb128 joe
2 mbi13, ybm328 mary