Home > Software engineering >  Pandas extracting/subsetting by a specific value from a groupby function
Pandas extracting/subsetting by a specific value from a groupby function

Time:05-30

I have a data frame, where I would like to make visible specific values from a status column, such as "Delivered" and/or "Confirmed".

orders.groupby(by=['status','size'])['id'].count()

I have never done this before, so I consulted the official Pandas documentation, I suspect the approach is to use .get_group().

orders.groupby(by=['status','size']).get_group(("Delivered")).agg("count")

However this generates an exception "ValueError: must supply a tuple to get_group with multiple grouping keys".

I am lost is this expecting "Delivered" then perhaps all of the viable entries for the size column? If so I am not sure I will know this upfront.

Is there another way to subset retaining just "Delivered" or "Confirmed"?

Many thanks! enter image description here

CodePudding user response:

I think the simplest solution is to filter only the rows with status == 'Delivered' before grouping.

delivered_id_count = orders[orders.status == 'Delivered'].groupby(by='size')['id'].count()

Or If you want to check for multiple status values, compute all the groups as you are doing and then select each one separately.

id_counts = orders.groupby(by=['status','size'])['id'].count()


# delivered status 
print(id_counts.loc['Delivered'])

# confirmed status
print(id_counts.loc['Confirmed'])
  • Related