I have these dataframes:
df_1 =
Col1 Col2 Col3
0 2021-01-02 A200 strings1
1 2021-01-03 B300 strings4
2 2021-01-05 A201 strings5
3 2021-01-22 B602 strings7
4 2021-01-09 B203 strings6
df_2 =
Material Description
0 200 N600
1 300 N500
2 201 N900
3 602 N800
4 700 N801
I want to put the right letter in front of the Material
number. What I did was to delete the letters in front of the content of Col2
, create an extra column that says whether it used to be an A
or a B
, and then added this to the Material
. But of course it is very inefficient and I have a problem when some Materials are not in Col2
and vice-versa. What I want would be:
df_2 =
Material Description
0 A200 N600
1 B300 N500
2 A201 N900
3 B602 N800
Is there any way to do this otherwise?
CodePudding user response:
Use Series.map
with dictionary - if need select by last match:
d = dict(zip(df_1['Col2'].str[1:].astype(int), df_1['Col2']))
df_2['Material'] = df_2['Material'].map(d)
df_2 = df_2.dropna(subset=['Material'])
print (df_2)
Material Description
0 B200 N600
1 B300 N500
2 A201 N900
3 B602 N800
If need mapping by first match:
df = df_1.assign(new = df_1['Col2'].str[1:].astype(int)).drop_duplicates('new')
d = dict(zip(df['new'], df['Col2']))
df_2['Material'] = df_2['Material'].map(d)
df_2 = df_2.dropna(subset=['Material'])
print (df_2)
Material Description
0 A200 N600
1 B300 N500
2 A201 N900
3 B602 N800
CodePudding user response:
Here's another approach:
Basically split "Col2" and merge
with df2
on "Material":
out = (df2
.merge(df1[['Col2']]
.assign(Material=df1['Col2']
.str.extract(r'(\D )(\d )')[1]
.astype(int)), on='Material')
.drop(columns='Material')
.rename(columns={'Col2':'Material'})[['Material', 'Description']])
Output:
Material Description
0 A200 N600
1 B300 N500
2 A201 N900
3 B602 N800