I know this is a common question, but I couldn't find something that matches my case. I have this data:
--------- ---------
| user_id | view_dt |
--------- ---------
| A | 1/1 |
--------- ---------
| A | 1/10 |
--------- ---------
| A | 1/14 |
--------- ---------
| A | 1/22 |
--------- ---------
| A | 1/23 |
--------- ---------
| A | 1/30 |
--------- ---------
I want this data to be grouped based on a 14 day interval. That is, the groups would be:
GROUP 1: 1/1, 1/10, 1/14
GROUP 2: 1/22, 1/23, 1/30
Note that I the 1/30 date should belong to GROUP 2 since 1/30 should be compared to the first date of GROUP 2 (1/22) instead of (1/1).
The problem I am having is my own query shows 1/30 belonging to group 3.
CREATE TABLE T (
user_id VARCHAR(20),
view_dt DATETIME
);
INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');
SELECT user_id,
view_dt,
DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT
user_id,
view_dt,
CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 1 AS gr
FROM T
) x
ORDER BY user_id
Ideal Output
--------- --------- -------
| user_id | view_dt | group |
--------- --------- -------
| A | 1/1 | 1 |
--------- --------- -------
| A | 1/10 | 1 |
--------- --------- -------
| A | 1/14 | 1 |
--------- --------- -------
| A | 1/22 | 2 |
--------- --------- -------
| A | 1/23 | 2 |
--------- --------- -------
| A | 1/30 | 2 |
--------- --------- -------
Output from my previous query:
--------- --------- -------
| user_id | view_dt | group |
--------- --------- -------
| A | 1/1 | 1 |
--------- --------- -------
| A | 1/10 | 1 |
--------- --------- -------
| A | 1/14 | 1 |
--------- --------- -------
| A | 1/22 | 2 |
--------- --------- -------
| A | 1/23 | 2 |
--------- --------- -------
| A | 1/30 | 3** |
--------- --------- -------
CodePudding user response:
One option is to use Recursive CTE
-- Recursive CTE solution
with cte as
(
-- CTE for adding a row_number
select rn = row_number() over (partition by user_id order by view_dt),
user_id, view_dt
from T
),
rcte as
(
-- RCTE - anchor member
-- first_dt is the first date of the group
select rn, user_id, view_dt, grp = 1, first_dt = view_dt
from cte
where rn = 1
union all
-- RCTE - recursive member
-- if date is more than 14 days from first_dt, grp 1, update first_dt
select c.rn, c.user_id, c.view_dt,
grp = case when datediff(day, r.first_dt, c.view_dt) > 14
then r.grp 1
else r.grp
end,
first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
then c.view_dt
else r.first_dt
end
from cte c
inner join rcte r on c.user_id = r.user_id
and c.rn = r.rn 1
)
select *
from rcte
Note : Please do avoid using arithmetic operator on date. view_dt - MIN (view_dt)
. Should use datediff()
See Bad Habits to Kick : Using shorthand with date/time operations