I have this requirement to find records for people but only include the first during any 31-day period.
If a person has more than one record in a 31-day period, include only the first record. For example, if a person has a record on January 1, include the January 1 record and do not include records that occur on or between January 2 and January 31; then, if applicable, include the next record that occurs on or after February 1. Identify visits chronologically, including only one per 31-day period.
I've tried a variety of methods to try to make this work with NOT EXISTS, ROW NUMBER, RANK and loops but haven't been able to make it work right.
Here's how it should group the records:
Person Date RECORD_GROUP
JACK 2021-01-02 1
JACK 2021-01-21 1
JACK 2021-01-25 1
JACK 2021-02-01 1
JACK 2021-02-04 2
JACK 2021-02-21 2
JACK 2021-03-01 2
JACK 2021-03-12 3
JACK 2021-03-30 3
JACK 2021-04-10 3
JACK 2021-04-16 4
JILL 2021-01-20 1
JILL 2021-01-22 1
JILL 2021-01-26 1
JILL 2021-02-18 1
JILL 2021-02-23 2
JILL 2021-02-28 2
JILL 2021-03-08 2
JILL 2021-03-26 3
JILL 2021-03-30 3
JILL 2021-04-20 3
JILL 2021-04-30 4
Here's some sample data:
SELECT 'JACK' AS PERSON, '2021-01-02' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP
INTO #TEMP
UNION
SELECT 'JACK' AS PERSON, '2021-01-21' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-01-25' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-02-01' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-02-04' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-02-21' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-03-01' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-03-12' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-03-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-04-10' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JACK' AS PERSON, '2021-04-16' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-01-20' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-01-22' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-01-26' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-02-18' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-02-23' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-02-28' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-03-08' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-03-26' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-03-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-04-20' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION
SELECT 'JILL' AS PERSON, '2021-04-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP
Any suggestions will be appreciated.
CodePudding user response:
As far as I know, this can only be done iteratively:
- Find first row for each person
- Iterate over subsequent rows in order
- Keep track of the leader date... a date that begins the group of 31 days
- On each iteration, update it when current date is 30 days more than previous leader date
with ncte as (
select
person,
record_date,
row_number() over (partition by person order by record_date) as rn
from #temp
), rcte as (
select
person,
record_date,
record_date as leader_date,
rn
from ncte
where rn = 1
union all
select
curr.person,
curr.record_date,
case when curr.record_date > dateadd(day, 30, prev.leader_date) then curr.record_date else prev.leader_date end,
curr.rn
from ncte as curr
join rcte as prev on curr.person = prev.person and curr.rn = prev.rn 1
)
select * -- the leader_date could be used to group and number related rows
from rcte
order by 1, 2
CodePudding user response:
Agree. Because the disposition of any row potentially depends on the disposition of every prior row (within the same name partition), I believe this can only be done without some form of iteration.
Salman A's solution simulates iteration with a recursive CTE (common table expression). Here is a solution that uses a cursor loop.
DECLARE @Data TABLE (Id INT IDENTITY(1,1), Person VARCHAR(100), Date DATETIME, Expected INT, Actual INT)
INSERT @Data (Person, Date, Expected)
VALUES
...
DECLARE @Id INT
DECLARE @Person VARCHAR(100)
DECLARE @Date DATETIME
DECLARE @PriorPerson VARCHAR(100) = NULL
DECLARE @PriorDate DATETIME = NULL
DECLARE @GroupNumber INT
DECLARE Csr CURSOR FOR
SELECT Id, Person, Date
FROM @Data
ORDER BY Person, Date
OPEN Csr
FETCH NEXT FROM Csr INTO @Id, @Person, @Date
WHILE @@FETCH_STATUS = 0
BEGIN
IF @PriorPerson IS NULL OR @Person <> @PriorPerson
BEGIN
SET @GroupNumber = 1
SET @PriorPerson = @Person
SET @PriorDate = @Date
END
ELSE IF @Date >= DATEADD(DAY, 31, @PriorDate)
BEGIN
SET @GroupNumber = @GroupNumber 1
SET @PriorDate = @Date
END
UPDATE @Data SET Actual = @GroupNumber WHERE Id = @Id
FETCH NEXT FROM Csr INTO @Id, @Person, @Date
END
CLOSE Csr
DEALLOCATE Csr
SELECT *, Test = CASE WHEN Expected = Actual THEN 'Match' ELSE 'Mismatch' END
FROM @Data
ORDER BY ID
This depends on ID to track the current record. ("UPDATE .. WHERE CURRENT OF Cursor" does work with a cursor with ORDER BY.) If you cannot add an ID, you can update based on name and date.
See this db<>fiddle for a working demo.