I have two dataframes (menu and orders) and I would like to label the items in the menu dataframe based on how popular it is (the number of times it appears in 'orders' dataframe).
import pandas as pd
menu = pd.DataFrame(
{'Item_Name':['Chicken Pizza','Mushroom Soup','Tiramisu'],
'Price':[8.99, 4.99, 5.99]})
orders = pd.DataFrame(
{'order':[1,2,3,4,5,6,7,8],
'Item Name':['Chicken Pizza','Mushroom Soup','Tiramisu','Chicken Pizza','Chicken Pizza','Mushroom Soup','Chicken Pizza','Tiramisu']})
menu
Item_Name Price
0 Chicken Pizza 8.99
1 Mushroom Soup 4.99
2 Tiramisu 5.99
orders
order Item Name
0 1 Chicken Pizza
1 2 Mushroom Soup
2 3 Tiramisu
3 4 Chicken Pizza
4 5 Chicken Pizza
5 6 Mushroom Soup
6 7 Chicken Pizza
7 8 Tiramisu
Desired Output: "1" if item ordered is more than the median of all items ordered and "0" if it is less or equal to the median number of times ordered.
Item_Name Price Popular
0 Chicken Pizza 8.99 1
1 Mushroom Soup 4.99 0
2 Tiramisu 5.99 0
I tried using this syntax but it does not work.
menu["Popular"] = lambda x: 1 if orders["Item Name"].count() > orders["Item Name"].median() else 0
CodePudding user response:
We need to groupby count
to get the number of each item. Then we can compare this to the median
of the counts not the entire DataFrame, convert the True/False values to 1/0 (astype
), and join
back to menu
on the Item_Name
column:
counts = orders.groupby('Item Name')['Item Name'].count()
menu = menu.join(
(counts > counts.median()).astype(int).rename('Popular'),
on='Item_Name'
)
Or with Series.value_counts
instead of groupby count
:
counts = orders['Item Name'].value_counts()
menu = menu.join(
(counts > counts.median()).astype(int).rename('Popular'),
on='Item_Name'
)
Either way gets menu
:
Item_Name Price Popular
0 Chicken Pizza 8.99 1
1 Mushroom Soup 4.99 0
2 Tiramisu 5.99 0
CodePudding user response:
You can groupby
"Item Name" size
to count the items, then compute the median and find elements greater than (gt
) it. The output being a boolean, one converts to int by changing type. Finally merge
the output Series of 0/1s with the "menu":
s = orders.groupby('Item Name').size()
menu.merge(s.gt(s.median()).astype(int).rename('Popular'),
left_on='Item_Name', right_index=True)
Output:
Item_Name Price Popular
0 Chicken Pizza 8.99 1
1 Mushroom Soup 4.99 0
2 Tiramisu 5.99 0