This is an abstraction sample of a very large dataset where I have two dataframes.In df1
only some codes, in df2
codes connected to a name.
I need now a new column in df1
called names which fit the according codes.
import pandas as pd
codes1 = ['prz', 'sfd', 'mbn','kdg', 'kdg', 'hsb', 'aqw','zzu', 'zkl', 'psb', 'qpb', 'gsh', 'lgr', 'yxc', 'esx', 'vbn', 'hgf', 'rew', 'prz', 'jjm']
codes2 = ['gsh', 'kdg', 'esx', 'prz']
names = ['jane','', '','john', 'john', '', '','', '', '', '', 'charlie', '', '', 'mary', '', '', '', 'jane', '']
names2 =['john', 'mary', 'charlie', 'jane']
df1 = pd.DataFrame(codes1, columns={'codes1'})
df2 = pd.DataFrame({'codes2': codes2, 'names': names2})
df1:
codes1
0 prz
1 sfd
2 mbn
3 kdg
4 kdg
5 hsb
6 aqw
7 zzu
8 zkl
9 psb
10 qpb
11 gsh
12 lgr
13 yxc
14 esx
15 vbn
16 hgf
17 rew
18 prz
19 jjm
df2:
codes2 names
0 gsh john
1 kdg mary
2 esx charlie
3 prz jane
desired output:
codes1 names
0 prz jane
1 sfd
2 mbn
3 kdg john
4 kdg john
5 hsb
I tried to do it with merge
but I doesn't work:
df1.merge(df2, on=codes1, how='outer')
How to achieve my desired output?
CodePudding user response:
Use Series.map
:
df1["names"] = df1["codes1"].map(df2.set_index("codes2").squeeze()).fillna("")
print(df1.head())
Output
codes1 names
0 prz jane
1 sfd
2 mbn
3 kdg mary
4 kdg mary
CodePudding user response:
Use a dictionary to map the values:
import pandas as pd
codes1 = ['prz', 'sfd', 'mbn','kdg', 'kdg', 'hsb', 'aqw','zzu', 'zkl', 'psb', 'qpb', 'gsh', 'lgr', 'yxc', 'esx', 'vbn', 'hgf', 'rew', 'prz', 'jjm']
codes2 = ['gsh', 'kdg', 'esx', 'prz']
names = ['jane','', '','john', 'john', '', '','', '', '', '', 'charlie', '', '', 'mary', '', '', '', 'jane', '']
names2 =['john', 'mary', 'charlie', 'jane']
df1 = pd.DataFrame(codes1, columns={'codes1'})
df2 = pd.DataFrame({'codes2': codes2, 'names': names2})
lookup_dict = dict(zip(df2['codes2'], df2['names']))
df1['names'] = df1['codes1'].map(lookup_dict)
Output:
print(df1)
codes1 names
0 prz jane
1 sfd NaN
2 mbn NaN
3 kdg mary
4 kdg mary
5 hsb NaN
6 aqw NaN
7 zzu NaN
8 zkl NaN
9 psb NaN
10 qpb NaN
11 gsh john
12 lgr NaN
13 yxc NaN
14 esx charlie
15 vbn NaN
16 hgf NaN
17 rew NaN
18 prz jane
19 jjm NaN
CodePudding user response:
You can try:
df1 = df1.merge(df2, how='left', left_on='codes1', right_on='codes2').drop(columns=['codes2'])
Output:
codes1 names
0 prz jane
1 sfd NaN
2 mbn NaN
3 kdg mary
4 kdg mary
5 hsb NaN