Home > Mobile >  Creating new column by mapping to dictionary (with string contain match)
Creating new column by mapping to dictionary (with string contain match)

Time:11-05

I am trying to create in df1 the column Factor based on the dictionary df2. However the Code columns for mapping are not exactly the same and the dictionary only contain the Code strings partially.

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P']})

df2 = pd.DataFrame({
    'Code':['R:EST', 'R:EKG', 'R:OID'],
    'Factor':[1, 1.3, 0.9]})

So far, I wasn't able to map the data frames correctly, because the columns are not exactly the same. The column Code does not necessary start with "R:".

df1['Factor'] = df1['Code'].map(df2.set_index('Code')['Factor'])

This is how the preferred output would look like:

df3 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P'],
    'Factor':[1, 1.3, 1.3, 1, 1.3, 1, 0.9, 0.9]})

Thanks a lot!

CodePudding user response:

>>> df1['Code'].str[:5].map(df2.set_index('Code')['Factor'])
0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
Name: Code, dtype: float64

>>> df2.Code.apply(lambda x:df1.Code.str.contains(x)).T.idxmax(axis=1).apply(lambda x:df2.Factor.iloc[x])

0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
dtype: float64
  • Related