I have following dataset:
──────────────────────────────── ───────────────── ─────────────────
| product_name | column_value_1 | column_value_2 |
──────────────────────────────── ───────────────── ─────────────────
| Coca Cola 1L | 1 | 1.8 |
| Carbonated drink Coca Cola 1L | NaN | 1.9 |
| Кола Кола | 2 | NaN |
| Yellow Cheese | NaN | 4.2 |
| Packed Yellow Cheese 1KG | 4 | 5 |
| Packed Yellow Cheese | 4.8 | 5 |
| Yogurt | 2.2 | 2 |
──────────────────────────────── ───────────────── ─────────────────
and a following dictionary
product_map = {
("Кока Кола", "Coca Cola 1L", "Carbonated drink Coca Cola 1L") : "Coca Cola 1L" ,
("Yellow Cheese", "Packed Yellow Cheese 1KG", "Packed Yellow Cheese") : "Packed Yellow Chees"
}
Please note that the last product does not exists in the dictionary, that's why I should get nothing in the group_by_column
I want to achieve following table, which after that I want to group by this column. How is this possible, also open for different approaches rather than using dictionary and map.
──────────────────────────────── ───────────────── ───────────────── ──────────────────────
| product_name | column_value_1 | column_value_2 | group_by_column |
──────────────────────────────── ───────────────── ───────────────── ──────────────────────
| Coca Cola 1L | 1 | 1.8 | Coca Cola 1L |
| Carbonated drink Coca Cola 1L | NaN | 1.9 | Coca Cola 1L |
| Coca Cola | 2 | NaN | Coca Cola 1L |
| Yellow Cheese | NaN | 4.2 | Packed Yellow Chees |
| Packed Yellow Cheese 1KG | 4 | 5 | Packed Yellow Chees |
| Packed Yellow Cheese | 4.8 | 5 | Packed Yellow Chees |
| Yogurt | 2.2 | 2 | NaN |
──────────────────────────────── ───────────────── ───────────────── ──────────────────────
CodePudding user response:
Having product_map expanded should simplify things a lot:
product_map_expanded = {i:val for key, val in product_map.items() for i in key }
product_map_expanded
now looks like:
{'Кола Кола': 'Coca Cola 1L',
'Coca Cola 1L': 'Coca Cola 1L',
'Carbonated drink Coca Cola 1L': 'Coca Cola 1L',
'Yellow Cheese': 'Packed Yellow Chees',
'Packed Yellow Cheese 1KG': 'Packed Yellow Chees',
'Packed Yellow Cheese': 'Packed Yellow Chees'}
Now you can have it mapped as:
df['group_by_column'] = df.product_name.map(product_map_expanded)
Which makes your dataframe:
product_name | column_value_1 | column_value_2 | group_by_column | |
---|---|---|---|---|
0 | Coca Cola 1L | 1 | 1.8 | Coca Cola 1L |
1 | Carbonated drink Coca Cola 1L | NaN | 1.9 | Coca Cola 1L |
2 | Кола Кола | 2 | NaN | Coca Cola 1L |
3 | Yellow Cheese | NaN | 4.2 | Packed Yellow Chees |
4 | Packed Yellow Cheese 1KG | 4 | 5 | Packed Yellow Chees |
5 | Packed Yellow Cheese | 4.8 | 5 | Packed Yellow Chees |
6 | Yogurt | 2.2 | 2 | nan |
CodePudding user response:
You can use a key-list dictionary and apply lambda to the new column that call a function to search the index name in the list and return it's key.
df = pd.DataFrame(df).set_index('product_name')
product_map = {
"Coca Cola 1L": ["Кола Кола", "Coca Cola 1L", "Carbonated drink Coca Cola 1L"],
"Packed Yellow Chees": ["Yellow Cheese", "Packed Yellow Cheese 1KG", "Packed Yellow Cheese"],
}
def get_key(k):
global product_map
for key, item_list in product_map.items():
if k in item_list:
return key
df['group_by_column'] = df.apply(lambda x: get_key(x.name), axis=1)
print(df)
column_value_1 column_value_2 group_by_column
product_name
Coca Cola 1L 1.0 1.8 Coca Cola 1L
Carbonated drink Coca Cola 1L NaN 1.9 Coca Cola 1L
Кола Кола 2.0 NaN Coca Cola 1L
Yellow Cheese NaN 4.2 Packed Yellow Chees
Packed Yellow Cheese 1KG 4.0 5.0 Packed Yellow Chees
Packed Yellow Cheese 4.8 5.0 Packed Yellow Chees
Yogurt 2.2 2.0 None