Home > Back-end >  How to map values across dataframes based on match
How to map values across dataframes based on match

Time:10-12

This is an abstraction sample of a very large dataset where I have two dataframes.In df1 only some codes, in df2 codes connected to a name. I need now a new column in df1 called names which fit the according codes.

import pandas as pd

codes1 = ['prz', 'sfd', 'mbn','kdg', 'kdg',  'hsb', 'aqw','zzu', 'zkl', 'psb', 'qpb', 'gsh', 'lgr', 'yxc', 'esx', 'vbn', 'hgf', 'rew', 'prz', 'jjm']
codes2 = ['gsh', 'kdg', 'esx', 'prz']
names = ['jane','', '','john', 'john', '', '','', '', '', '', 'charlie', '', '', 'mary', '', '', '', 'jane', '']
names2 =['john', 'mary', 'charlie', 'jane'] 

df1 = pd.DataFrame(codes1, columns={'codes1'})
df2 = pd.DataFrame({'codes2': codes2, 'names': names2})

df1:

   codes1
0     prz
1     sfd
2     mbn
3     kdg
4     kdg
5     hsb
6     aqw
7     zzu
8     zkl
9     psb
10    qpb
11    gsh
12    lgr
13    yxc
14    esx
15    vbn
16    hgf
17    rew
18    prz
19    jjm

df2:

  codes2    names
0    gsh     john
1    kdg     mary
2    esx  charlie
3    prz     jane

desired output:

   codes1    names
0     prz     jane
1     sfd         
2     mbn         
3     kdg     john
4     kdg     john
5     hsb  

I tried to do it with merge but I doesn't work:

df1.merge(df2, on=codes1, how='outer')

How to achieve my desired output?

CodePudding user response:

Use Series.map:

df1["names"] = df1["codes1"].map(df2.set_index("codes2").squeeze()).fillna("")
print(df1.head())

Output

  codes1 names
0    prz  jane
1    sfd      
2    mbn      
3    kdg  mary
4    kdg  mary

CodePudding user response:

Use a dictionary to map the values:

import pandas as pd

codes1 = ['prz', 'sfd', 'mbn','kdg', 'kdg',  'hsb', 'aqw','zzu', 'zkl', 'psb', 'qpb', 'gsh', 'lgr', 'yxc', 'esx', 'vbn', 'hgf', 'rew', 'prz', 'jjm']
codes2 = ['gsh', 'kdg', 'esx', 'prz']
names = ['jane','', '','john', 'john', '', '','', '', '', '', 'charlie', '', '', 'mary', '', '', '', 'jane', '']
names2 =['john', 'mary', 'charlie', 'jane'] 

df1 = pd.DataFrame(codes1, columns={'codes1'})
df2 = pd.DataFrame({'codes2': codes2, 'names': names2})

lookup_dict = dict(zip(df2['codes2'], df2['names']))


df1['names'] = df1['codes1'].map(lookup_dict)

Output:

print(df1)
   codes1    names
0     prz     jane
1     sfd      NaN
2     mbn      NaN
3     kdg     mary
4     kdg     mary
5     hsb      NaN
6     aqw      NaN
7     zzu      NaN
8     zkl      NaN
9     psb      NaN
10    qpb      NaN
11    gsh     john
12    lgr      NaN
13    yxc      NaN
14    esx  charlie
15    vbn      NaN
16    hgf      NaN
17    rew      NaN
18    prz     jane
19    jjm      NaN

CodePudding user response:

You can try:

df1 = df1.merge(df2, how='left', left_on='codes1', right_on='codes2').drop(columns=['codes2'])

Output:

   codes1    names
0     prz     jane
1     sfd      NaN
2     mbn      NaN
3     kdg     mary
4     kdg     mary
5     hsb      NaN
  • Related