I have a situation that I would like to resolve in SQL using as few inner joins as possible, so I am thinking of using window functions.
I am using Snowflake, but I am happy to see answers in other DB engines as well.
Table sales has sale_id and seller_id, as well as sale_datetime, amount... Table seller has seller_id
I would like to get the first and second sales for each seller, as columns preferably (so that I can add and subtract eg sum of amounts). If there is only one sale, ideally I would not retrieve it.
I am trying to use something like this:
select se.seller_id,sa.*, first_value(sale_datetime) over (ORDER BY
sale_datetime ), LAG(sale_datetime) OVER (ORDER BY sale_datetime)
from seller se inner join sales sa on sa.seller_id = se.seller_id
order by se.seller_id;
But I would need to add a group by, because the window functions need to be applied at the seller level, not to the total of the table. Whether I add it at the end of the query or inside the window, I get an error.
The other way that I have of doing this without windows is with two with...as clauses that retrieve first and second sale for each seller respectively and a join between them, but I am trying to avoid this for performance reasons.
Thanks!
CodePudding user response:
You answer was already mostly there. You just need to use the LAG and use ROW_NUMBER/QUALIFY
CTE for Data:
with sales(sale_id, seller_id, sale_datetime, amount) as (
select * from values
(4, 10, '2022-04-24'::date, 100),
(3, 10, '2022-04-23'::date, 200),
(2, 10, '2022-04-22'::date, 400),
(1, 11, '2022-04-22'::date, 300)
)
working SQL:
select sa.*,
LAG(sale_datetime) OVER (partition by sa.seller_id ORDER BY sa.sale_datetime) as prior_sale_date,
LAG(amount) OVER (partition by sa.seller_id ORDER BY sa.sale_datetime) as prior_amount
FROM sales AS sa
qualify row_number() over(partition by sa.seller_id order by sale_datetime) = 2
order by sa.seller_id;
gives:
SALE_ID | SELLER_ID | SALE_DATETIME | AMOUNT | PRIOR_SALE_DATE | PRIOR_AMOUNT |
---|---|---|---|---|---|
2 | 10 | 2022-04-23 | 200 | 2022-04-22 | 400 |
this works because every row get the prior values via the two LAG's, but we only keep the second row.
I Also dropped the seller se
table because it is add no value to the process, as it's just a filter.., but that can be present via:
more complex data CTE's:
select * from values
(4, 10, '2022-04-24'::date, 100),
(3, 10, '2022-04-23'::date, 200),
(2, 10, '2022-04-22'::date, 400),
(1, 11, '2022-04-22'::date, 300)
), seller(seller_id) as (
select * from values (10),(11),(12),(13)
)
working SQL with JOIN back in.
select sa.*,
LAG(sale_datetime) OVER (partition by sa.seller_id ORDER BY sa.sale_datetime) as prior_sale_date,
LAG(amount) OVER (partition by sa.seller_id ORDER BY sa.sale_datetime) as prior_amount
FROM seller as se
JOIN sales AS sa
ON se.seller_id = sa.seller_id
qualify row_number() over(partition by sa.seller_id order by sale_datetime) = 2
order by sa.seller_id;
gives:
SALE_ID | SELLER_ID | SALE_DATETIME | AMOUNT | PRIOR_SALE_DATE | PRIOR_AMOUNT |
---|---|---|---|---|---|
3 | 10 | 2022-04-23 | 200 | 2022-04-22 | 400 |
CodePudding user response:
ROW_NUMBER and QUALIFY could be used:
select se.seller_id,sa.*
from seller se
join sales sa
on sa.seller_id = se.seller_id
qualify row_number() over(partition by sa.seller_id order by sale_datetime) <= 2
order by se.seller_id;