I have the following pandas dataframe:
import pandas as pd
foo_dt = pd.DataFrame({'var_1': ['filter coffee', 'american cheesecake', 'espresso coffee', 'latte tea'],
'var_2': ['coffee', 'coffee black', 'tea', 'strawberry cheesecake']})
and the following dictionary:
foo_colors = {'coffee': 'brown', 'cheesecake': 'white', 'tea': 'green'}
I want to add two columns in foo_dt
(color_var_1
and color_var_2
), the values of which will be the respective value
of the foo_colors
dictionary which corresponds to the key
depending if the key
is in
the value of the column var_1
or var_2
respectively.
EDIT
In other words, for every key in foo_colors
, check where "it is contained" in both columns var_1
& var_2
, and then give as value of the respective column (color_var_1
& color_var_2
) the respective value of the dictionary
My resulting dataframe looks like this:
var_1 var_2 color_var_1 color_var_2
0 filter coffee coffee brown brown
1 american cheesecake coffee black white brown
2 espresso coffee tea brown green
3 latte tea strawberry cheesecake green white
Any idea how can I do this ?
CodePudding user response:
Use Series.str.extract
for get first matched substring created by join by |
for regex or
of keys in dict with Series.map
by dict:
pat = '|'.join(r"\b{}\b".format(x) for x in foo_colors)
for c in ['var_1','var_2']:
foo_dt[f'color_{c}'] = foo_dt[c].str.extract(f'({pat})', expand=False).map(foo_colors)
print(foo_dt)
var_1 var_2 color_var_1 color_var_2
0 filter coffee coffee brown brown
1 american cheesecake coffee black white brown
2 espresso coffee tea brown green
3 latte tea strawberry cheesecake green white