Map df array column with dict-CodePudding

I have a dataframe which has columns of arrays:

id_food1    id_food2
[1]       NaN
[2]       NaN
[2 3]     [1]

I want to map thse columns to a dict with values:

food_dict = {1: 'cake', 
               2: 'choco', 
               3: 'cream'}

I want to have something like this :

id_food1    id_food2  id_food1_name  id_food2_name
[1]       NaN.        [cake]          0
[2]       NaN         [choco]        0
[2 3]     [1]          [choco,cream] [cake]

I know how to do it when the columns are not array like this

data['id_food1_name'] = data['id_food1'].map(food_dict)

but unable to do it when it is an array.

Any help will be highly appreciated

CodePudding user response：

Use Series.explode for flatten values, mapping and last aggregate list pre index:

data['id_food1_name'] = (data['id_food1'].explode().astype(float)
                                  .map(food_dict).groupby(level=0).agg(list))

For all columns:

#converting strings to lists
import ast

c = ['id_food1', 'id_food2']

def f(x):
    try:
        return ast.literal_eval(x)
    except:
        return np.nan
data[c] = data[c].applymap(f)

Alternative solution for convert to lists:

data[c] = data[c].stack().str.strip('[]').str.split().unstack()

And then mapping

for x in c:
    f = lambda x: [food_dict.get(int(y)) for y in x if int(y) in food_dict]
    data[f'{x}_name'] = data[x].dropna().apply(f)
    data[f'{x}_name'] = data[f'{x}_name'].fillna(0)
print (data)
  id_food1 id_food2   id_food1_name id_food2_name
0      [1]      NaN          [cake]             0
1      [2]      NaN         [choco]             0
2   [2, 3]      [1]  [choco, cream]        [cake]

CodePudding user response：

You could use a dict comprehension in which you explode map groupby agg(list) (with if-else to convert NaN to 0); then assign it back to df.

df = (df.assign(**{f'id_food{i}_name': df[f'id_food{i}']
                   .explode()
                   .map(food_dict)
                   .groupby(level=0)
                   .agg(lambda x: x.tolist() if x.notna().all() else 0) 
                   for i in range(1,df[['id_food1','id_food2']].shape[1] 1)}))

Output:

  id_food1 id_food2   id_food1_name id_food2_name
0      [1]      NaN          [cake]             0
1      [2]      NaN         [choco]             0
2   [2, 3]      [1]  [choco, cream]        [cake]