I have the following table structure:
date | New Customers |
---|---|
01.04.21 | 4 |
01.05.21 | 1 |
01.06.21 | 2 |
01.07.21 | 6 |
01.08.21 | 3 |
01.09.21 | 2 |
01.10.21 | 3 |
01.11.21 | 8 |
01.12.21 | 3 |
01.01.22 | 4 |
01.02.22 | 0 |
01.03.22 | 3 |
01.04.22 | 3 |
01.05.22 | 2 |
01.06.22 | 3 |
01.07.22 | 1 |
01.08.22 | 3 |
01.09.22 | 2 |
01.10.22 | 3 |
01.11.22 | 1 |
01.12.22 | 6 |
01.01.23 | 8 |
01.02.23 | 4 |
I am struggling in Snowflake with a query that should show me the the sum of previous 12 months for every distinct month in the table.
I am trying to create a result set that looks like:
Reporting Date | Customer # |
---|---|
01.03.22 | 39 |
01.04.22 | 38 |
01.05.22 | 39 |
01.06.22 | 40 |
01.07.22 | 35 |
01.08.22 | 35 |
01.09.22 | 35 |
01.10.22 | 35 |
01.11.22 | 28 |
01.12.22 | 31 |
01.01.23 | 35 |
01.02.23 | 39 |
The way reporting date 01.03.2022 is calculated: it is the sum of past 12 months row of "new customers" columns = 4 1 2 6 3 2 3 8 4 0 3= 39
CodePudding user response:
Seems to work - but you need an additional nested query to build the running sum and a filter value to remove the first 12 months:
WITH
-- your input, don't use in final query ..
indata(dt,New_Customers) AS (
SELECT DATE '2021-04-01',4 UNION ALL SELECT DATE '2021-05-01',1
UNION ALL SELECT DATE '2021-06-01',2 UNION ALL SELECT DATE '2021-07-01',6
UNION ALL SELECT DATE '2021-08-01',3 UNION ALL SELECT DATE '2021-09-01',2
UNION ALL SELECT DATE '2021-10-01',3 UNION ALL SELECT DATE '2021-11-01',8
UNION ALL SELECT DATE '2021-12-01',3 UNION ALL SELECT DATE '2022-01-01',4
UNION ALL SELECT DATE '2022-02-01',0 UNION ALL SELECT DATE '2022-03-01',3
UNION ALL SELECT DATE '2022-04-01',3 UNION ALL SELECT DATE '2022-05-01',2
UNION ALL SELECT DATE '2022-06-01',3 UNION ALL SELECT DATE '2022-07-01',1
UNION ALL SELECT DATE '2022-08-01',3 UNION ALL SELECT DATE '2022-09-01',2
UNION ALL SELECT DATE '2022-10-01',3 UNION ALL SELECT DATE '2022-11-01',1
UNION ALL SELECT DATE '2022-12-01',6 UNION ALL SELECT DATE '2023-01-01',8
UNION ALL SELECT DATE '2023-02-01',4
)
-- real query starts here, replace following comma with "WITH"
,
-- need to build the row number and the running sum first in a subselect ...
olap AS (
SELECT
dt
, SUM(new_customers) OVER(ORDER BY dt ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS running_cust_count
, ROW_NUMBER() OVER(ORDER BY dt ) AS rownum
FROM indata
)
SELECT
dt AS rep_date
, running_cust_count
FROM olap
WHERE rownum >= 12
;
-- out rep_date | running_cust_count
-- out ------------ --------------------
-- out 2022-03-01 | 39
-- out 2022-04-01 | 38
-- out 2022-05-01 | 39
-- out 2022-06-01 | 40
-- out 2022-07-01 | 35
-- out 2022-08-01 | 35
-- out 2022-09-01 | 35
-- out 2022-10-01 | 35
-- out 2022-11-01 | 28
-- out 2022-12-01 | 31
-- out 2023-01-01 | 35
-- out 2023-02-01 | 39
CodePudding user response:
You can try a window function to create aggregates:
select date, SUM(customers) OVER (ORDER BY date
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
from table
group by date, customers
order by 1;
This is Postgres, but if I remember well this should work in Snowflake too.