Home > Back-end >  SQL to find sum of total days in a window for a series of changes
SQL to find sum of total days in a window for a series of changes

Time:12-21

Following is the table:

start_date recorded_date id
2021-11-10 2021-11-01 1a
2021-11-08 2021-11-02 1a
2021-11-11 2021-11-03 1a
2021-11-10 2021-11-04 1a
2021-11-10 2021-11-05 1a

I need a query to find the total day changes in aggregate for a given id. In this case, it changed from 10th Nov to 8th Nov so 2 days, then again from 8th to 11th Nov so 3 days and again from 11th to 10th for a day, and finally from 10th to 10th, that is 0 days.

In total there is a change of 2 3 1 0 = 6 days for the id - '1a'.

Basically for each change there is a recorded_date, so we arrange that in ascending order and then calculate the aggregate change of days grouped by id. The final result should be like:

id Agg_Change
1a 6

Is there a way to do this using SQL. I am using vertica database.

Thanks.

CodePudding user response:

you can use window function lead to get the difference between rows and then group by id

select id, sum(daydiff) Agg_Change
from (
select id, abs(datediff(day, start_Date, lead(start_date,1,start_date) over (partition by id order by recorded_date))) as daydiff
from tablename
) t group by id 

CodePudding user response:

I was thinking lag function will provide me the answer, but it kept giving me wrong answer because I had the wrong logic in one place. I have the answer I need:

with cte as(
select id, start_date, recorded_date,
row_number() over(partition by id order by recorded_date asc) as idrank,
lag(start_date,1) over(partition by id order by recorded_date asc) as prev
from table_temp
)
select id, sum(abs(date(start_date) - date(prev))) as Agg_Change
from cte
group by 1

If someone has a better solution please let me know.

CodePudding user response:

It's indeed the use of LAG() to get the previous date in an OLAP query, and an outer query getting the absolute date difference, and the sum of it, grouping by id:

WITH
-- your input - don't use in real query ...
indata(start_date,recorded_date,id) AS (
          SELECT DATE '2021-11-10',DATE '2021-11-01','1a'
UNION ALL SELECT DATE '2021-11-08',DATE '2021-11-02','1a'
UNION ALL SELECT DATE '2021-11-11',DATE '2021-11-03','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-04','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-05','1a'
)
-- real query starts here, replace following comma with "WITH" ...
,
w_lag AS (
  SELECT
    id
  , start_date
  , LAG(start_date) OVER w AS prevdt
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY recorded_date)
)
SELECT
  id
, SUM(ABS(DATEDIFF(DAY,start_date,prevdt))) AS dtdiff
FROM w_lag
GROUP BY id
-- out  id | dtdiff 
-- out ---- --------
-- out  1a |      6
  • Related