Editing the content of column values based on another dataframe-CodePudding

I have these dataframes:

df_1 = 
    Col1             Col2      Col3
0   2021-01-02       A200      strings1
1   2021-01-03       B300      strings4
2   2021-01-05       A201      strings5
3   2021-01-22       B602      strings7
4   2021-01-09       B203      strings6

df_2 =

   Material     Description
0       200            N600
1       300            N500
2       201            N900
3       602            N800
4       700            N801

I want to put the right letter in front of the Material number. What I did was to delete the letters in front of the content of Col2, create an extra column that says whether it used to be an A or a B, and then added this to the Material. But of course it is very inefficient and I have a problem when some Materials are not in Col2 and vice-versa. What I want would be:

df_2 =

    Material     Description
0       A200            N600
1       B300            N500
2       A201            N900
3       B602            N800

Is there any way to do this otherwise?

CodePudding user response：

Use Series.map with dictionary - if need select by last match:

d = dict(zip(df_1['Col2'].str[1:].astype(int), df_1['Col2']))
df_2['Material'] = df_2['Material'].map(d)

df_2 = df_2.dropna(subset=['Material'])
print (df_2)
  Material Description
0     B200        N600
1     B300        N500
2     A201        N900
3     B602        N800

If need mapping by first match:

df = df_1.assign(new = df_1['Col2'].str[1:].astype(int)).drop_duplicates('new')
d = dict(zip(df['new'], df['Col2']))
df_2['Material'] = df_2['Material'].map(d)

df_2 = df_2.dropna(subset=['Material'])
print (df_2)
  Material Description
0     A200        N600
1     B300        N500
2     A201        N900
3     B602        N800

CodePudding user response：

Here's another approach:

Basically split "Col2" and merge with df2 on "Material":

out = (df2
       .merge(df1[['Col2']]
              .assign(Material=df1['Col2']
                      .str.extract(r'(\D )(\d )')[1]
                      .astype(int)), on='Material')
       .drop(columns='Material')
       .rename(columns={'Col2':'Material'})[['Material', 'Description']])

Output:

  Material Description
0     A200        N600
1     B300        N500
2     A201        N900
3     B602        N800