How can also show null (or zero) values corresponding to Q1 and Q5 values of cluster column on SQLse-CodePudding

I made this exercise on SQL server: write a query that lists for each cluster the quantity of products that fall within it. The company wants to obtain an analysis of sales with respect to the average quantity of each product present in each order, classifying them into six clusters: Q1 (<15), Q2 (15-20), Q3 (21-25), Q4 (26-30), Q5 (31-35), Q6(>35). Write a query that lists, for each product, the product name and the cluster to which it belong. The database is northwind

select count(ProductName) as prod_num ,cluster

from  (
        select  ProductName,
        case 
        when avg(Quantity) < 15 then 'Q1'
        when avg(Quantity) <= 20 then 'Q2'
        when avg(Quantity) between 21 and 25 then 'Q3'
        when avg(Quantity) between 26 and 30 then 'Q4'
        when avg(Quantity) between 31 and 35 then 'Q5'
        else 'Q6'
        end
        as cluster
        from [Order Details] od  join Products pr on od.ProductID=pr.ProductID
        group by  ProductName
        ) as clusters  
group by  cluster
order by cluster

OUTPUT
22  Q2
35  Q3
18  Q4
2   Q6

I also need to display values for Q1 and Q5.

CodePudding user response：

You can always seed your initial counts, for example:

declare @clusters table (prod_num int, cluster nchar(2));
insert into @clusters values
    (0, 'Q1'),(0, 'Q2'),(0, 'Q3'),(0, 'Q4'),(0, 'Q5'),(0, 'Q6');
    
select 
    t1.cluster,
    t1.prod_num   isnull(t2.prod_num, 0) as prod_num
from 
    @clusters t1
    left join
        (
            select count(ProductName) as prod_num ,cluster
            from  (
                    select  ProductName,
                    case 
                    when avg(Quantity) < 15 then 'Q1'
                    when avg(Quantity) between 15 and 20 then 'Q2'
                    when avg(Quantity) between 21 and 25 then 'Q3'
                    when avg(Quantity) between 26 and 30 then 'Q4'
                    when avg(Quantity) between 31 and 35 then 'Q5'
                    else 'Q6'
                    end
                    as cluster
                    from [Order Details] od  join Products pr on od.ProductID=pr.ProductID
                    group by  ProductName
                    ) as clusters  
            group by  cluster
        ) t2
    on t1.cluster = t2.cluster
order by t1.cluster;

Now we have an initial count of zero for all groups and add to that the counts we found in our query.

Untested so please let me know if you find errors...

CodePudding user response：

You can make your query much simpler by storing the clusters and the ranges in a table. I use a #temp table here but there's no reason this couldn't be a static, permanent dimension table.

CREATE TABLE #clusters(cluster char(2), lo int, hi int);

INSERT #clusters VALUES('Q1', 0,14),('Q2',15,20),('Q3',21,25),
                       ('Q4',26,30),('Q5',31,35),('Q6',36,2000000000);

SELECT prod_num = COUNT(p.ProductName), cl.cluster
FROM #clusters AS cl
LEFT OUTER JOIN
(
  SELECT pr.ProductName, avgQ = AVG(od.Quantity) 
    FROM dbo.[Order Details] AS od
    INNER JOIN dbo.Products AS pr
    ON od.ProductID = pr.ProductID
    GROUP BY pr.ProductName
) AS p
ON p.avgQ BETWEEN cl.lo AND cl.hi
  GROUP BY cl.cluster;

Working example in this fiddle.

And again, unless ProductName is not unique and you really care about averages across different IDs with the same name as opposed to just product IDs, the join is unnecessary and you could simplify even further:

SELECT prod_num = COUNT(p.ProductID), cl.cluster
FROM #clusters AS cl
LEFT OUTER JOIN
(
  SELECT ProductID, avgQ = AVG(Quantity) 
    FROM dbo.[Order Details]
    GROUP BY ProductID
) AS p
ON p.avgQ BETWEEN cl.lo AND cl.hi
  GROUP BY cl.cluster;