Rename columns with ranges based on dictionary-CodePudding

I have this dataframe:

df = pd.DataFrame( {'an2': {0: 'f', 1: 'i', 2: '', 3: '', 4: 'f', 5: 'c,f,i,g', 6: 'c,d,e,g'}} )

which yields to:

    an2
0   f
1   i
2   
3   
4   f
5   c,f,i,g
6   c,d,e,g

I would like to create new column df['an3'] by renaming df['an2'] according to the following dictionary:

dic = {'a': 'john', 
'b': 'paul', 
'c': 'mike',
'd': 'elephant',
'e': 'water', 
'f': 'bread', 
'g': 'julie',
'h': 'anna', 
'i': 'mauricio',
'j': 'claudia'}

Therefore desired output is:

    an2      an3
0   f        bread
1   i        mauricio
2       
3       
4   f        bread
5   c,f,i,g  mike,bread,mauricio,claudia
6   c,d,e,g  mike,elephant,water,claudia

I tried using dictionary above with the following code

df['an3'] = df['fan2'].replace(dic)

unfortunatley it only work for those cells where one single entry was found on df['an2']

CodePudding user response：

You can replace values by match splitted values by , with dict.get, if no match get original value, last join back by ,:

df['an3'] = df['an2'].apply(lambda x: ','.join(dic.get(y,y) for y in x.split(',')))
print (df)
       an2                        an3
0        f                      bread
1        i                   mauricio
2                                    
3                                    
4        f                      bread
5  c,f,i,g  mike,bread,mauricio,julie
6  c,d,e,g  mike,elephant,water,julie

Or us callable Series.str.replace with word boundaries:

regex = '|'.join(r"\b{}\b".format(x) for x in dic.keys())
df['an3'] = df['an2'].str.replace(regex, lambda x: dic[x.group()], regex=True)
print (df)
       an2                        an3
0        f                      bread
1        i                   mauricio
2                                    
3                                    
4        f                      bread
5  c,f,i,g  mike,bread,mauricio,julie
6  c,d,e,g  mike,elephant,water,julie

CodePudding user response：

Let us try Series.replace:

df['an2'].replace({fr'\b{k}\b': v for k, v in dic.items()}, regex=True)

0                        bread
1                     mauricio
2                             
3                             
4                        bread
5    mike,bread,mauricio,julie
6    mike,elephant,water,julie
Name: an2, dtype: object

CodePudding user response：

You can explode values then map to your dict and reshape your dataframe:

df['an3'] = df['an2'].str.split(',').explode().map(dic).dropna() \
                     .groupby(level=0).apply(','.join) \
                     .reindex(df.index, fill_value='')
print(df)

# Output
       an2                        an3
0        f                      bread
1        i                   mauricio
2                                    
3                                    
4        f                      bread
5  c,f,i,g  mike,bread,mauricio,julie
6  c,d,e,g  mike,elephant,water,julie