I am trying to sum sales on my table grouped for next 30 days.
Here is the data I have.
date | sales |
---|---|
2021-08-08 | 35 |
2021-08-08 | 14 |
2021-08-11 | 35 |
2021-09-09 | 22 |
2021-09-21 | 44 |
2021-10-16 | 46 |
2021-10-25 | 9 |
2021-10-25 | 1 |
2021-10-25 | 2 |
2021-10-25 | 6 |
2021-11-04 | 1 |
2021-11-07 | 1 |
I am expecting the result like this
date | sales | total 30d |
---|---|---|
2021-08-08 | 14 | 84 |
2021-08-08 | 35 | 84 |
2021-08-11 | 35 | 57 |
2021-09-09 | 22 | 66 |
2021-09-21 | 44 | 90 |
2021-10-16 | 46 | 76 |
2021-10-25 | 9 | 30 |
2021-10-25 | 6 | 30 |
2021-10-25 | 2 | 30 |
2021-10-25 | 1 | 30 |
2021-11-04 | 1 | 12 |
2021-11-07 | 1 | 29 |
Explanation: For date 2021-08-08 it's going to sum the sales from date >= 2021-08-08 til 30 days after which is 2021-09-07 so it would be 14 35 35 (2021-08-08, 2021-08-08 and 2021-08-11).
I got my work around using this query, but I think it is very inefficient and I believe there's other way using windows function though I can't wrap my head around it.
with temp as (
select date, sum(sales) as sales_total
from test_table
group by 1,2
)
, temp2 as (
SELECT a.date, SUM(b.sales_total) total
FROM temp a, temp b
WHERE b.date >= a.date AND b.date <= a.date interval '30' day
GROUP BY a.date
)
select a.date, a.sales, b.total
from test_table a
JOIN temp2 b on a.date = b.date
Any pointer to solve this problem using windows function or any other more efficient way is appreciated. Thanks!
CodePudding user response:
Here is a way to do this using recursive cte.
with data
as (select start_dt,sales,row_number() over(order by start_dt) as rnk
from t1
)
,cte_data
as ( select d.start_dt,d.sales,d.rnk,d.start_dt as grp_dt
from data d
where rnk=1
union all
select d.start_dt,d.sales,d.rnk,case when datediff(day,d2.grp_dt,d.start_dt)>30 then
d.start_dt
else d2.grp_dt
end
from cte_data d2
join data d
on d.rnk=d2.rnk 1
)
select min(start_dt) start_range,max(start_dt) end_range,sum(sales) as sum_sales
from cte_data
group by grp_dt
------------- ------------ -----------
| start_range | end_range | sum_sales |
------------- ------------ -----------
| 2021-08-08 | 2021-08-11 | 84 |
| 2021-09-09 | 2021-09-21 | 66 |
| 2021-10-16 | 2021-11-07 | 66 |
------------- ------------ -----------
sql fiddle link https://dbfiddle.uk/2LmQe5n5
To elaborate a bit on this. We would need some kind of recursive process to do this, as the results of the following row depends on the previous entry date, and so I doubt we could do this using window functions.
CodePudding user response:
It seems that you have an overlapping groups problem, you could use a self-join and aggregate as the following:
SELECT T.date, T.sales,
SUM(D.sales) AS total_30d
FROM
test_table T JOIN test_table D
ON D.date BETWEEN T.date AND T.date interval '30' day
GROUP BY T.date, T.sales
ORDER BY T.date, T.sales
See a demo.
CodePudding user response:
Straightforward using a scalar subquery:
select ext.*,
(
select sum(sales) from the_table
where "date" between ext.date and ext.date 30
) as total_30d
from the_table as ext;
DB-fiddle demo.
date | sales | sum |
---|---|---|
2021-08-08 | 35 | 84 |
2021-08-08 | 14 | 84 |
2021-08-11 | 35 | 57 |
2021-09-09 | 22 | 66 |
2021-09-21 | 44 | 90 |
2021-10-16 | 46 | 66 |
2021-10-25 | 9 | 20 |
2021-10-25 | 1 | 20 |
2021-10-25 | 2 | 20 |
2021-10-25 | 6 | 20 |
2021-11-04 | 1 | 2 |
2021-11-07 | 1 | 1 |