Home > Mobile >  Pandas map multikey dictionary to dataframe
Pandas map multikey dictionary to dataframe

Time:05-21

I have following dataset:

 ──────────────────────────────── ───────────────── ───────────────── 
| product_name                   | column_value_1  | column_value_2  |
 ──────────────────────────────── ───────────────── ───────────────── 
| Coca Cola 1L                   | 1               | 1.8             |
| Carbonated drink Coca Cola 1L  | NaN             | 1.9             |
| Кола Кола                      | 2               | NaN             |
| Yellow Cheese                  | NaN             | 4.2             |
| Packed Yellow Cheese 1KG       | 4               | 5               |
| Packed Yellow Cheese           | 4.8             | 5               |
| Yogurt                         | 2.2             | 2               |
 ──────────────────────────────── ───────────────── ───────────────── 

and a following dictionary

product_map = {
    ("Кока Кола", "Coca Cola 1L", "Carbonated drink Coca Cola 1L") : "Coca Cola 1L" ,
    ("Yellow Cheese", "Packed Yellow Cheese 1KG", "Packed Yellow Cheese") : "Packed Yellow Chees" 
}

Please note that the last product does not exists in the dictionary, that's why I should get nothing in the group_by_column

I want to achieve following table, which after that I want to group by this column. How is this possible, also open for different approaches rather than using dictionary and map.

 ──────────────────────────────── ───────────────── ───────────────── ────────────────────── 
| product_name                   | column_value_1  | column_value_2  | group_by_column      |
 ──────────────────────────────── ───────────────── ───────────────── ────────────────────── 
| Coca Cola 1L                   | 1               | 1.8             | Coca Cola 1L         |
| Carbonated drink Coca Cola 1L  | NaN             | 1.9             | Coca Cola 1L         |
| Coca Cola                      | 2               | NaN             | Coca Cola 1L         |
| Yellow Cheese                  | NaN             | 4.2             | Packed Yellow Chees  |
| Packed Yellow Cheese 1KG       | 4               | 5               | Packed Yellow Chees  |
| Packed Yellow Cheese           | 4.8             | 5               | Packed Yellow Chees  |
| Yogurt                         | 2.2             | 2               | NaN                  |
 ──────────────────────────────── ───────────────── ───────────────── ────────────────────── 

CodePudding user response:

Having product_map expanded should simplify things a lot:

product_map_expanded = {i:val for key, val in product_map.items() for i in key }

product_map_expanded now looks like:

{'Кола Кола': 'Coca Cola 1L',
 'Coca Cola 1L': 'Coca Cola 1L',
 'Carbonated drink Coca Cola 1L': 'Coca Cola 1L',
 'Yellow Cheese': 'Packed Yellow Chees',
 'Packed Yellow Cheese 1KG': 'Packed Yellow Chees',
 'Packed Yellow Cheese': 'Packed Yellow Chees'}

Now you can have it mapped as:

df['group_by_column'] = df.product_name.map(product_map_expanded)

Which makes your dataframe:

product_name column_value_1 column_value_2 group_by_column
0 Coca Cola 1L 1 1.8 Coca Cola 1L
1 Carbonated drink Coca Cola 1L NaN 1.9 Coca Cola 1L
2 Кола Кола 2 NaN Coca Cola 1L
3 Yellow Cheese NaN 4.2 Packed Yellow Chees
4 Packed Yellow Cheese 1KG 4 5 Packed Yellow Chees
5 Packed Yellow Cheese 4.8 5 Packed Yellow Chees
6 Yogurt 2.2 2 nan

CodePudding user response:

You can use a key-list dictionary and apply lambda to the new column that call a function to search the index name in the list and return it's key.

df = pd.DataFrame(df).set_index('product_name')

product_map = {
    "Coca Cola 1L": ["Кола Кола", "Coca Cola 1L", "Carbonated drink Coca Cola 1L"],
    "Packed Yellow Chees": ["Yellow Cheese", "Packed Yellow Cheese 1KG", "Packed Yellow Cheese"],
}


def get_key(k):
    global product_map
    for key, item_list in product_map.items():
        if k in item_list:
            return key


df['group_by_column'] = df.apply(lambda x: get_key(x.name), axis=1)
print(df)


                                column_value_1  column_value_2      group_by_column
product_name
Coca Cola 1L                              1.0             1.8         Coca Cola 1L
Carbonated drink Coca Cola 1L             NaN             1.9         Coca Cola 1L
Кола Кола                                 2.0             NaN         Coca Cola 1L
Yellow Cheese                             NaN             4.2  Packed Yellow Chees
Packed Yellow Cheese 1KG                  4.0             5.0  Packed Yellow Chees
Packed Yellow Cheese                      4.8             5.0  Packed Yellow Chees
Yogurt                                    2.2             2.0                 None
  • Related