I am trying to create a distribution based on counts. I have a dataset where the rows represent sales. There is a discount_code variable, which determines the discount_code used in the sale. There are 200 distinct discount codes.
I want to count the distinct discount codes that fall in grouped sales counts. For example, how many discount codes had 25-100 sales associated with it? This will tell me the distribution of successful codes (if success is a discount code having many sales).
I wrote this code to start:
SELECT discountcode,
CASE
WHEN COUNT(sales) < 25 THEN 'A. <25 sales'
WHEN COUNT(sales) < 100 AND COUNT(sales) > 25 THEN 'B. <100 sales'
WHEN COUNT(sales) < 200 AND COUNT(sales) > 100 THEN 'C. <200 sales'
WHEN COUNT(sales) < 500 AND COUNT(sales) > 200 THEN 'D. <500 sales'
WHEN COUNT(sales) < 1000 AND COUNT(sales) > 500 THEN 'E. <1,000 sales'
WHEN COUNT(sales) < 5000 AND COUNT(sales) > 1000 THEN 'F. <5,000 sales'
ELSE 'G. >5,000 sales'
END AS discount_code_dist
FROM table
GROUP BY discountcode;
I can't group by an aggregate function, which is where I get stuck. I want to group by discount_code_dist. I think I need to count and group by within my case, but have not figured out how to do it.
I am trying not to use a subquery. Therefore, I prefer to know how to do it without one. However, I am curious to know how a subquery would do it (especially if that is the only way).
CodePudding user response:
I'd use a CTE/subquery
WITH X as (
SELECT discountcode,
CASE
WHEN COUNT(sales) < 25 THEN 'A. <25 sales'
WHEN COUNT(sales) < 100 THEN 'B. <100 sales'
WHEN COUNT(sales) < 200 THEN 'C. <200 sales'
WHEN COUNT(sales) < 500 THEN 'D. <500 sales'
WHEN COUNT(sales) < 1000 THEN 'E. <1,000 sales'
WHEN COUNT(sales) < 5000 THEN 'F. <5,000 sales'
ELSE 'G. >5,000 sales'
END AS discount_code_dist
FROM table
GROUP BY discountcode;
)
SELECT discount_code_dist, COUNT(*) FROM x GROUP BY discount_code_dist
You don't need to test for ranges - the case when is assessed in order, so you only need one bound
CodePudding user response:
Use materialized CTE:
with t1 as materialized
(
SELECT
discountcode,
COUNT(sales) as sales_total
FROM table
GROUP BY discountcode
)
SELECT
discountcode,
case
WHEN sales_total < 25 THEN 'A. <25 sales'
WHEN sales_total < 100 AND sales_total > 25 THEN 'B. <100 sales'
WHEN sales_total < 200 AND sales_total > 100 THEN 'C. <200 sales'
WHEN sales_total < 500 AND sales_total > 200 THEN 'D. <500 sales'
WHEN sales_total < 1000 AND sales_total > 500 THEN 'E. <1,000 sales'
WHEN sales_total < 5000 AND sales_total > 1000 THEN 'F. <5,000 sales'
ELSE 'G. >5,000 sales' END AS discount_code_dist
FROM t1;