How do I do a groupby in python to split between orders?-CodePudding

I have a dataframe that shows order listing. How do I use it to find the number of orders that contain spicy food?

   Order       Item Name Spicy
0   1000  Calamari Rings    No
1   1001      Cheesecake    No
2   1001   Spicy Chicken    Yes
3   1002      Spicy Lamb    Yes

Using this code gives me 2 Yes and 2 No, but it should actually be 2 Yes and 1 No as order 1001 is duplicated. Thank you.

dd = df_orders.groupby("Item Name")["Order"].count()

I would like to get an output that shows the number of orders that contains spicy food.

   Spicy        
0    Yes   2
1     No   1

CodePudding user response：

Use sort_values to separate 'No' and 'Yes' spicy values then drop_duplicates before value_counts:

>>> df.sort_values('Spicy') \
      .drop_duplicates('Order', keep='last') \
      .value_counts('Spicy')

Spicy
Yes    2
No     1
dtype: int64

CodePudding user response：

I am not so sure about how your output will look. You can try to sort according to spicy column such that yes comes first, then drop duplicates and count, this ensures you retain the yes:

df_orders = pd.DataFrame({'Order':[1000,1001,1001,1002],
'Item Name':['Calamari Rings','Cheesecake','Spicy Chicken','Spicy Lamb'],
'Spicy':['No','No','Yes','Yes']})

df_orders.sort_values("Spicy",ascending=False).drop_duplicates('Order')['Spicy'].value_counts()

Output:

Yes    2
No     1
Name: Spicy, dtype: int64

CodePudding user response：

There's perhaps a simpler way, but this works.

First group by order and spicy counts, to get the count of spicy for each order. Then sort by spicy and drop duplicates by order number (removes the 'No' in spicy column if a yes exists for that order). Then group by Spicy again and count to get the counts.

df_orders = pd.DataFrame({'Order' : [1000, 1001, 1001, 1002],
                          'Item Name' : ['Calamari Rings', 'Cheesecake', 'Spicy Chicken', 'Spicy Lamb'],
                          'Spicy' : ['No', 'No', 'Yes', 'Yes']})

df_grouped = df_orders.groupby(['Order', 'Spicy']).count().reset_index()
df_grouped = df_grouped.sort_values(by='Spicy').drop_duplicates(subset='Order', keep='last')
df_grouped = df_grouped.groupby('Spicy').count()['Order'].reset_index()

Output:

  Spicy  Order
0    No      1
1   Yes      2

CodePudding user response：

first groupby base on Spicy and then convert to a dict and with a loop on dict keys to access numbers of order that contains spicy food:

    dictionary = df_orders.groupby('Spicy')['Order].apply(list).to_dict()
    for k in dictionary:
       print(k, ':',len(dictionary[k]))