The basic requirement is this: We capture sales by day of week and product. If more than half* of the day's sales came from one product, we want to capture that. Else we show "none".
So image we sell shoes, pants and shirts. On Monday, we sold $100 of each. So it was a three way split, and each category accounted for 33.3% of sales. We show "none". On Tuesday though, half of our sales came from shoes, and on Wednesday, 80% from shirts. So we want to see that.
The query below returns the desired result, but I'm not a fan of a queries within queries within queries. They can be inefficient and hard to read, and I feel like there's a cleaner way. Can this be improved upon?
*The requirement for half will be a parameter (@threshold here). In some cases, we might want to show only when it's 75% or more of sales. Obviously that parameter has to be >= 50%.
declare @sales as table (day_of_week varchar(16), product varchar(8), sales_amt int)
insert into @sales values ('monday', 'shoes', 100)
insert into @sales values ('monday', 'pants', 100)
insert into @sales values ('monday', 'shirts', 100)
insert into @sales values ('tuesday', 'shoes', 500)
insert into @sales values ('tuesday', 'pants', 300)
insert into @sales values ('tuesday', 'shirts', 200)
insert into @sales values ('wednesday', 'shoes', 100)
insert into @sales values ('wednesday', 'pants', 100)
insert into @sales values ('wednesday', 'shirts', 800)
declare @threshold as decimal(3,2) = 0.5
select day_of_week, case when pct_of_day >= @threshold then product else 'none' end half_of_sales from (
select day_of_week, product, pct_of_day, row_number() over (partition by day_of_week order by pct_of_day desc) _rn
from (
select day_of_week, product, sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day
from @sales
group by day_of_week, product
) x
) z
where _rn = 1
CodePudding user response:
maybe a little easier to read?
DECLARE @threshold AS decimal(3, 2) = 0.5;
WITH ssum
AS (SELECT
day_of_week,
SUM(sales_amt) sa
FROM @sales
GROUP BY day_of_week)
SELECT
s.day_of_week,
MAX(CASE WHEN s.sales_amt * 1.0 / ssum.sa >= @threshold THEN s.product ELSE 'none' END) threshold
FROM ssum
INNER JOIN @sales AS s
ON ssum.day_of_week = s.day_of_week
GROUP BY s.day_of_week
CodePudding user response:
Firstly, you can place the nested queries in CTEs, which can make them easier to read. It won't make them more efficient, but then nested queries are not necessarily inefficient in themselves, not sure why you think so
Second, the query could be optimized, because the row-numbering is equally valid on the non-percentaged sum(sales_amt)
value, so it can be on the same level as the windowed sum over
declare @threshold as decimal(3,2) = 0.5;
with GroupedSales as (
select
day_of_week,
product,
sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day,
row_number() over (partition by day_of_week order by sum(sales_amt) desc) _rn
from @sales
group by
day_of_week,
product
)
select
day_of_week,
case when pct_of_day >= @threshold
then product
else 'none'
end half_of_sales
from GroupedSales
where _rn = 1;