I have a dataframe that shows order listing. How do I use it to find the number of orders that contain spicy food?
Order Item Name Spicy
0 1000 Calamari Rings No
1 1001 Cheesecake No
2 1001 Spicy Chicken Yes
3 1002 Spicy Lamb Yes
Using this code gives me 2 Yes and 2 No, but it should actually be 2 Yes and 1 No as order 1001 is duplicated. Thank you.
dd = df_orders.groupby("Item Name")["Order"].count()
I would like to get an output that shows the number of orders that contains spicy food.
Spicy
0 Yes 2
1 No 1
CodePudding user response:
Use sort_values
to separate 'No' and 'Yes' spicy values then drop_duplicates
before value_counts
:
>>> df.sort_values('Spicy') \
.drop_duplicates('Order', keep='last') \
.value_counts('Spicy')
Spicy
Yes 2
No 1
dtype: int64
CodePudding user response:
I am not so sure about how your output will look. You can try to sort according to spicy
column such that yes
comes first, then drop duplicates and count, this ensures you retain the yes:
df_orders = pd.DataFrame({'Order':[1000,1001,1001,1002],
'Item Name':['Calamari Rings','Cheesecake','Spicy Chicken','Spicy Lamb'],
'Spicy':['No','No','Yes','Yes']})
df_orders.sort_values("Spicy",ascending=False).drop_duplicates('Order')['Spicy'].value_counts()
Output:
Yes 2
No 1
Name: Spicy, dtype: int64
CodePudding user response:
There's perhaps a simpler way, but this works.
First group by order and spicy counts, to get the count of spicy for each order. Then sort by spicy and drop duplicates by order number (removes the 'No' in spicy column if a yes exists for that order). Then group by Spicy again and count to get the counts.
df_orders = pd.DataFrame({'Order' : [1000, 1001, 1001, 1002],
'Item Name' : ['Calamari Rings', 'Cheesecake', 'Spicy Chicken', 'Spicy Lamb'],
'Spicy' : ['No', 'No', 'Yes', 'Yes']})
df_grouped = df_orders.groupby(['Order', 'Spicy']).count().reset_index()
df_grouped = df_grouped.sort_values(by='Spicy').drop_duplicates(subset='Order', keep='last')
df_grouped = df_grouped.groupby('Spicy').count()['Order'].reset_index()
Output:
Spicy Order
0 No 1
1 Yes 2
CodePudding user response:
first groupby base on Spicy and then convert to a dict and with a loop on dict keys to access numbers of order that contains spicy food:
dictionary = df_orders.groupby('Spicy')['Order].apply(list).to_dict()
for k in dictionary:
print(k, ':',len(dictionary[k]))