Home > Mobile >  Map df array column with dict
Map df array column with dict

Time:02-17

I have a dataframe which has columns of arrays:

id_food1    id_food2
[1]       NaN
[2]       NaN
[2 3]     [1]

I want to map thse columns to a dict with values:

food_dict = {1: 'cake', 
               2: 'choco', 
               3: 'cream'}

I want to have something like this :

id_food1    id_food2  id_food1_name  id_food2_name
[1]       NaN.        [cake]          0
[2]       NaN         [choco]        0
[2 3]     [1]          [choco,cream] [cake]

I know how to do it when the columns are not array like this

data['id_food1_name'] = data['id_food1'].map(food_dict)

but unable to do it when it is an array.

Any help will be highly appreciated

CodePudding user response:

Use Series.explode for flatten values, mapping and last aggregate list pre index:

data['id_food1_name'] = (data['id_food1'].explode().astype(float)
                                  .map(food_dict).groupby(level=0).agg(list))

For all columns:

#converting strings to lists
import ast

c = ['id_food1', 'id_food2']

def f(x):
    try:
        return ast.literal_eval(x)
    except:
        return np.nan
data[c] = data[c].applymap(f)

Alternative solution for convert to lists:

data[c] = data[c].stack().str.strip('[]').str.split().unstack()

And then mapping

for x in c:
    f = lambda x: [food_dict.get(int(y)) for y in x if int(y) in food_dict]
    data[f'{x}_name'] = data[x].dropna().apply(f)
    data[f'{x}_name'] = data[f'{x}_name'].fillna(0)
print (data)
  id_food1 id_food2   id_food1_name id_food2_name
0      [1]      NaN          [cake]             0
1      [2]      NaN         [choco]             0
2   [2, 3]      [1]  [choco, cream]        [cake]

CodePudding user response:

You could use a dict comprehension in which you explode map groupby agg(list) (with if-else to convert NaN to 0); then assign it back to df.

df = (df.assign(**{f'id_food{i}_name': df[f'id_food{i}']
                   .explode()
                   .map(food_dict)
                   .groupby(level=0)
                   .agg(lambda x: x.tolist() if x.notna().all() else 0) 
                   for i in range(1,df[['id_food1','id_food2']].shape[1] 1)}))

Output:

  id_food1 id_food2   id_food1_name id_food2_name
0      [1]      NaN          [cake]             0
1      [2]      NaN         [choco]             0
2   [2, 3]      [1]  [choco, cream]        [cake]
  • Related