Home > Blockchain >  How to add additional column based on the number of times it appear
How to add additional column based on the number of times it appear

Time:09-21

I have two dataframes (menu and orders) and I would like to label the items in the menu dataframe based on how popular it is (the number of times it appears in 'orders' dataframe).

import pandas as pd

menu = pd.DataFrame(
    {'Item_Name':['Chicken Pizza','Mushroom Soup','Tiramisu'],
    'Price':[8.99, 4.99, 5.99]})

orders = pd.DataFrame(
    {'order':[1,2,3,4,5,6,7,8],
    'Item Name':['Chicken Pizza','Mushroom Soup','Tiramisu','Chicken Pizza','Chicken Pizza','Mushroom Soup','Chicken Pizza','Tiramisu']})

menu

        Item_Name    Price  
0   Chicken Pizza     8.99
1   Mushroom Soup     4.99
2        Tiramisu     5.99

orders

   order      Item Name  
0   1     Chicken Pizza    
1   2     Mushroom Soup
2   3          Tiramisu    
3   4     Chicken Pizza
4   5     Chicken Pizza    
5   6     Mushroom Soup    
6   7     Chicken Pizza    
7   8          Tiramisu

Desired Output: "1" if item ordered is more than the median of all items ordered and "0" if it is less or equal to the median number of times ordered.

        Item_Name    Price    Popular
0   Chicken Pizza     8.99          1
1   Mushroom Soup     4.99          0
2        Tiramisu     5.99          0

I tried using this syntax but it does not work.

menu["Popular"] = lambda x: 1 if orders["Item Name"].count() > orders["Item Name"].median() else 0

CodePudding user response:

We need to groupby count to get the number of each item. Then we can compare this to the median of the counts not the entire DataFrame, convert the True/False values to 1/0 (astype), and join back to menu on the Item_Name column:

counts = orders.groupby('Item Name')['Item Name'].count()
menu = menu.join(
    (counts > counts.median()).astype(int).rename('Popular'),
    on='Item_Name'
)

Or with Series.value_counts instead of groupby count:

counts = orders['Item Name'].value_counts()
menu = menu.join(
    (counts > counts.median()).astype(int).rename('Popular'),
    on='Item_Name'
)

Either way gets menu:

       Item_Name  Price  Popular
0  Chicken Pizza   8.99        1
1  Mushroom Soup   4.99        0
2       Tiramisu   5.99        0

CodePudding user response:

You can groupby "Item Name" size to count the items, then compute the median and find elements greater than (gt) it. The output being a boolean, one converts to int by changing type. Finally merge the output Series of 0/1s with the "menu":

s = orders.groupby('Item Name').size()
menu.merge(s.gt(s.median()).astype(int).rename('Popular'),
           left_on='Item_Name', right_index=True)

Output:

       Item_Name  Price  Popular
0  Chicken Pizza   8.99        1
1  Mushroom Soup   4.99        0
2       Tiramisu   5.99        0
  • Related