I have a data frame, where I would like to make visible specific values from a status column, such as "Delivered" and/or "Confirmed".
orders.groupby(by=['status','size'])['id'].count()
I have never done this before, so I consulted the official Pandas documentation, I suspect the approach is to use .get_group()
.
orders.groupby(by=['status','size']).get_group(("Delivered")).agg("count")
However this generates an exception "ValueError: must supply a tuple to get_group with multiple grouping keys".
I am lost is this expecting "Delivered" then perhaps all of the viable entries for the size column? If so I am not sure I will know this upfront.
Is there another way to subset retaining just "Delivered" or "Confirmed"?
CodePudding user response:
I think the simplest solution is to filter only the rows with status == 'Delivered'
before grouping.
delivered_id_count = orders[orders.status == 'Delivered'].groupby(by='size')['id'].count()
Or If you want to check for multiple status values, compute all the groups as you are doing and then select each one separately.
id_counts = orders.groupby(by=['status','size'])['id'].count()
# delivered status
print(id_counts.loc['Delivered'])
# confirmed status
print(id_counts.loc['Confirmed'])