Home > database >  How do you group by a case statement with a count function?
How do you group by a case statement with a count function?

Time:04-17

I am trying to create a distribution based on counts. I have a dataset where the rows represent sales. There is a discount_code variable, which determines the discount_code used in the sale. There are 200 distinct discount codes.

I want to count the distinct discount codes that fall in grouped sales counts. For example, how many discount codes had 25-100 sales associated with it? This will tell me the distribution of successful codes (if success is a discount code having many sales).

I wrote this code to start:

SELECT discountcode,
    CASE
    WHEN COUNT(sales) < 25 THEN 'A. <25 sales'
    WHEN COUNT(sales) < 100 AND COUNT(sales) > 25 THEN 'B. <100 sales'
    WHEN COUNT(sales) < 200 AND COUNT(sales) > 100 THEN 'C. <200 sales'
    WHEN COUNT(sales) < 500 AND COUNT(sales) > 200 THEN 'D. <500 sales'
    WHEN COUNT(sales) < 1000 AND COUNT(sales) > 500 THEN 'E. <1,000 sales'
    WHEN COUNT(sales) < 5000 AND COUNT(sales) > 1000 THEN 'F. <5,000 sales'
    ELSE 'G. >5,000 sales'
    END AS discount_code_dist
FROM table
GROUP BY discountcode;

I can't group by an aggregate function, which is where I get stuck. I want to group by discount_code_dist. I think I need to count and group by within my case, but have not figured out how to do it.

I am trying not to use a subquery. Therefore, I prefer to know how to do it without one. However, I am curious to know how a subquery would do it (especially if that is the only way).

CodePudding user response:

I'd use a CTE/subquery

WITH X as (
  SELECT discountcode,
    CASE
    WHEN COUNT(sales) < 25 THEN 'A. <25 sales'
    WHEN COUNT(sales) < 100 THEN 'B. <100 sales'
    WHEN COUNT(sales) < 200 THEN 'C. <200 sales'
    WHEN COUNT(sales) < 500 THEN 'D. <500 sales'
    WHEN COUNT(sales) < 1000 THEN 'E. <1,000 sales'
    WHEN COUNT(sales) < 5000 THEN 'F. <5,000 sales'
    ELSE 'G. >5,000 sales'
    END AS discount_code_dist
  FROM table
  GROUP BY discountcode;
)
SELECT discount_code_dist, COUNT(*) FROM x GROUP BY discount_code_dist

You don't need to test for ranges - the case when is assessed in order, so you only need one bound

CodePudding user response:

Use materialized CTE:

with t1 as materialized 
(
    SELECT 
        discountcode,
        COUNT(sales) as sales_total
    FROM table
    GROUP BY discountcode
)
SELECT 
    discountcode,
    case
        WHEN sales_total < 25 THEN 'A. <25 sales'
        WHEN sales_total < 100 AND sales_total > 25 THEN 'B. <100 sales'
        WHEN sales_total < 200 AND sales_total > 100 THEN 'C. <200 sales'
        WHEN sales_total < 500 AND sales_total > 200 THEN 'D. <500 sales'
        WHEN sales_total < 1000 AND sales_total > 500 THEN 'E. <1,000 sales'
        WHEN sales_total < 5000 AND sales_total > 1000 THEN 'F. <5,000 sales'
    ELSE 'G. >5,000 sales' END AS discount_code_dist
FROM t1;
  • Related