How to map multiple nested dictionaries to Pandas DataFrame-CodePudding

I have a Pandas DataFrame that looks something like this:

import pandas as pd

df = pd.DataFrame({
    "class":["a","b","c","b","a","b","c","c","a","c","b","c","c","a"],
    "country":["aus","usa","fra","aus","ger","usa","usa","ger","aus","usa","fra","fra","usa","aus"],
    "category":["w","x","y","x","z","z","w","x","y","y","x","z","y","z"]
})

    class   country category
0   a       aus     w
1   b       usa     x
2   c       fra     y
3   b       aus     x
4   a       ger     z
5   b       usa     z
6   c       usa     w
7   c       ger     x
8   a       aus     y
9   c       usa     y
10  b       fra     x
11  c       fra     z
12  c       usa     y
13  a       aus     z

Furthermore, I have a dictionary for each class, each of which is a nested dictionary containing a number for each category for each country:

class_a_numbers = {'aus': {'w': 100, 'x': 50, 'y': 80, 'z': 10},
 'fra': {'w': 120, 'x': 60, 'y': 90, 'z': 10},
 'ger': {'w': 105, 'x': 75, 'y': 85, 'z': 0},
 'usa': {'w': 140, 'x': 100, 'y': 110, 'z': 50}}

class_b_numbers = {'aus': {'w': 1000, 'x': 500, 'y': 800, 'z': 100},
 'fra': {'w': 1200, 'x': 600, 'y': 900, 'z': 100},
 'ger': {'w': 1050, 'x': 750, 'y': 850, 'z': 0},
 'usa': {'w': 1400, 'x': 1000, 'y': 1100, 'z': 500}}

class_c_numbers = {'aus': {'w': 10, 'x': 5, 'y': 9, 'z': 2},
 'fra': {'w': 13, 'x': 5, 'y': 8, 'z': 1},
 'ger': {'w': 9, 'x': 6, 'y': 3, 'z': 1},
 'usa': {'w': 12, 'x': 12, 'y': 10, 'z': 3}}

I would like to create a number column in df and populate it with the appropriate number from the appropriate dictionary, such that the outcome will look like this:

    class   country category    number
0   a       aus     w           100
1   b       usa     x           1000
2   c       fra     y           8
3   b       aus     x           500
4   a       ger     z           0
5   b       usa     z           500
6   c       usa     w           12
7   c       ger     x           6
8   a       aus     y           80
9   c       usa     y           10
10  b       fra     x           600
11  c       fra     z           1
12  c       usa     y           10
13  a       aus     z           10

My real dataset is quite large, so efficiency is an important factor. Normally, I would use map from a dictionary, but I can't figure out how to make it work with these three nested dictionaries.

CodePudding user response：

df.apply to the rescue:

numbers = {
    'a': class_a_numbers,
    'b': class_b_numbers,
    'c': class_c_numbers,
}

df["number"] = df.apply(lambda row: numbers[row["class"]][row["country"]][row["category"]], axis=1)

CodePudding user response：

For improve performance join all dictionaries together, create one dictioanry with tuple key and convert to Series with MultiIndex, so possible add new column by DataFrame.join:

d = {'a': class_a_numbers,'b': class_b_numbers,'c': class_c_numbers,}

d= {(k2,k, k1): v1  for k2, v2 in d.items() for k, v in v2.items() for k1, v1 in v.items()}
df = df.join(pd.Series(d, name='number'), on=['class','country','category'])
print (df)
   class country category  number
0      a     aus        w     100
1      b     usa        x    1000
2      c     fra        y       8
3      b     aus        x     500
4      a     ger        z       0
5      b     usa        z     500
6      c     usa        w      12
7      c     ger        x       6
8      a     aus        y      80
9      c     usa        y      10
10     b     fra        x     600
11     c     fra        z       1
12     c     usa        y      10
13     a     aus        z      10

Perfomance: If not use loop by apply or iterrows is increased performance:

df = pd.DataFrame({
    "class":["a","b","c","b","a","b","c","c","a","c","b","c","c","a"],
    "country":["aus","usa","fra","aus","ger","usa","usa","ger","aus","usa",
              "fra","fra","usa","aus"],
    "category":["w","x","y","x","z","z","w","x","y","y","x","z","y","z"]
})

#[140000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)
    
class_a_numbers = {'aus': {'w': 100, 'x': 50, 'y': 80, 'z': 10},
 'fra': {'w': 120, 'x': 60, 'y': 90, 'z': 10},
 'ger': {'w': 105, 'x': 75, 'y': 85, 'z': 0},
 'usa': {'w': 140, 'x': 100, 'y': 110, 'z': 50}}

class_b_numbers = {'aus': {'w': 1000, 'x': 500, 'y': 800, 'z': 100},
 'fra': {'w': 1200, 'x': 600, 'y': 900, 'z': 100},
 'ger': {'w': 1050, 'x': 750, 'y': 850, 'z': 0},
 'usa': {'w': 1400, 'x': 1000, 'y': 1100, 'z': 500}}

class_c_numbers = {'aus': {'w': 10, 'x': 5, 'y': 9, 'z': 2},
 'fra': {'w': 13, 'x': 5, 'y': 8, 'z': 1},
 'ger': {'w': 9, 'x': 6, 'y': 3, 'z': 1},
 'usa': {'w': 12, 'x': 12, 'y': 10, 'z': 3}}

d = {'a': class_a_numbers,'b': class_b_numbers,'c': class_c_numbers,}
numbers = {
    'a': class_a_numbers,
    'b': class_b_numbers,
    'c': class_c_numbers,
}


print (df)

In [119]: %timeit df.join(pd.Series({(k2,k, k1): v1  for k2, v2 in d.items() for k, v in v2.items() for k1, v1 in v.items()}, name='number'), on=['class','country','category'])
36.4 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [120]: %timeit df["number"] = df.apply(lambda row: numbers[row["class"]][row["country"]][row["category"]], axis=1)
2.18 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

CodePudding user response：

do something like this.

#make sure u spin this code around a form loop from range(0,len(df))
#this code is for class a dict only make same for class b and c will be same 
#just remove a and add 'b' , 'c'.
if df["class"]== a:# checks if class if a 
    # if class is a then go to class_a_numbers dict
    # Then it will search for the key value(country) 
    # Which is in your df dataframe.
    unpacked_dict = class_a_numbers[df.loc[i,"country]]

    #now your unpacked_dict contains a dict
    # dict =  {'w': 100, 'x': 50, 'y': 80, 'z': 10}
    # now we search the key whose value match df["category"] 
    #here i is from the for loop form 0 to len(df)

    number_val = unpacked_dict[df.loc[i,"category"]
    df.loc[i,"number"] =number_val

complete the code and check if this works if it did good if not ping me in comments will check it. and comments are just to help you understand ever point code is small.