Home > OS >  merging pandas df based on 'contains' to complete df
merging pandas df based on 'contains' to complete df

Time:08-08

I have 2 large pandas dfs, where the first one contains IDs where some are updated and others are not. I want to merge the dfs such that it will include the corresponding old/new ID as a comma separated list.

DF1:

ID         name 
nfi23     sally
arb128    joe
mbi13     mary

DF2

ID_old     ID_updated
nfi23        wjm348
hji21        arb128
mbi13        ybm328

desired:

ID                 name
nfi23, wjm348      sally
hji21, arb128      joe
mbi13, ybm328      mary

CodePudding user response:

here is one way to do it

#combine the old and updated id and create new column
df2['combined'] = df2['ID_old']   ","   df2['ID_updated']

# melt, to flatten the DF
df3=df2.melt('combined', value_name='ID')

#finally, merge the DF and the melted DF2 (as DF3)
df4=df.merge(df3, 
        on='ID',
        how='left').drop(columns=['variable','ID'])
df4

OR make use the MAP

#combine the old and updated id and create new column
df2['combined'] = df2['ID_old']   ","   df2['ID_updated']

# melt, to flatten the DF
df3=df2.melt('combined', value_name='ID')

#finally, use map in mapping the value
df['combined']=df['ID'].map(df3.set_index('ID')['combined'])
df.drop(columns='ID')

    name    combined
0   sally   nfi23,wjm348
1   joe     hji21,arb128
2   mary    mbi13,ybm328

CodePudding user response:

You can set ID_old as an index in df2 and then map it to df1 while combining them:

mapper = df2.set_index("ID_old")["ID_updated"]
df1['ID'] = df1['ID']   ", "   df1['ID'].map(mapper)

df1 Ouput:

     ID             name
0   nfi23, wjm348   sally
1   arb128, arb128  joe
2   mbi13, ybm328   mary
  • Related