Home > front end >  Filtering on product frequency and category
Filtering on product frequency and category

Time:03-25

I think there is a fairly simple solution to this but I just can't wrap my head around it right now.

I have a data frame with several hundred thousand orders. I am trying to find the nsmallest margin products by sub-category, but I am trying to filter out products that have been sold less than 2 separate times. e.g. if a product has only been involved in 1 transaction I don't want to include it in my output.

I would like my final output to be a frame of nsmallest margin products where 'Sub-Category' == 'Appliance' and 'Product' value count > 2.

The portion I'm getting stuck on is the "Product" value count portion. I just can't get it to work.

I've tried value_counts() as below:

df[(df['Sub-Category'] == 'Appliances') & (df.value_counts('Product Name') > 2)]\
.groupby('Product Name', as_index = False)['margin'].mean().nsmallest(20,'margin')

I've tried groupby().count()

df[(df['Sub-Category'] == 'Appliances') & (df.groupby('Product Name').count() > 2)]\
.groupby('Product Name', as_index = False)['margin'].mean().nsmallest(20,'margin')

I think I know the problem: I think the problem is that I'm filtering an aggregated frame and a non aggregated frame but I just cannot think of how to check and filter the frequency of a 'Product Name'. I'm kind of brain dead at this point.

Any suggestions?

Thanks in Advance

CodePudding user response:

I think the problem is that I'm filtering an aggregated frame and a non aggregated frame.

You are right so use map to replace each product name by its value count.

Replace:

(df.value_counts('Product Name') > 2)

By:

(df['Product Name'].map(df.value_counts('Product Name')) > 2)

# OR

(df.groupby('Product Name')['Product Name'].transform('size') > 2)
  • Related