Home > Mobile >  Group records within 31-day period of first record
Group records within 31-day period of first record

Time:03-04

I have this requirement to find records for people but only include the first during any 31-day period.

If a person has more than one record in a 31-day period, include only the first record. For example, if a person has a record on January 1, include the January 1 record and do not include records that occur on or between January 2 and January 31; then, if applicable, include the next record that occurs on or after February 1. Identify visits chronologically, including only one per 31-day period.

I've tried a variety of methods to try to make this work with NOT EXISTS, ROW NUMBER, RANK and loops but haven't been able to make it work right.

Here's how it should group the records:

Person  Date    RECORD_GROUP
JACK    2021-01-02  1
JACK    2021-01-21  1
JACK    2021-01-25  1
JACK    2021-02-01  1
JACK    2021-02-04  2
JACK    2021-02-21  2
JACK    2021-03-01  2
JACK    2021-03-12  3
JACK    2021-03-30  3
JACK    2021-04-10  3
JACK    2021-04-16  4
JILL    2021-01-20  1
JILL    2021-01-22  1
JILL    2021-01-26  1
JILL    2021-02-18  1
JILL    2021-02-23  2
JILL    2021-02-28  2
JILL    2021-03-08  2
JILL    2021-03-26  3
JILL    2021-03-30  3
JILL    2021-04-20  3
JILL    2021-04-30  4

Here's some sample data:

SELECT 'JACK' AS PERSON, '2021-01-02' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP
INTO #TEMP 
UNION 
SELECT 'JACK' AS PERSON, '2021-01-21' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-01-25' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-02-01' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-02-04' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-02-21' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-03-01' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-03-12' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-03-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-04-10' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JACK' AS PERSON, '2021-04-16' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-01-20' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-01-22' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-01-26' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-02-18' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-02-23' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-02-28' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-03-08' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-03-26' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-03-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-04-20' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP UNION 
SELECT 'JILL' AS PERSON, '2021-04-30' AS RECORD_DATE, CAST(NULL AS SMALLINT) RECORD_GROUP  

Any suggestions will be appreciated.

CodePudding user response:

As far as I know, this can only be done iteratively:

  • Find first row for each person
  • Iterate over subsequent rows in order
  • Keep track of the leader date... a date that begins the group of 31 days
  • On each iteration, update it when current date is 30 days more than previous leader date
with ncte as (
    select
        person,
        record_date,
        row_number() over (partition by person order by record_date) as rn
    from #temp
), rcte as (
    select
        person,
        record_date,
        record_date as leader_date,
        rn
    from ncte
    where rn = 1

    union all

    select
        curr.person,
        curr.record_date,
        case when curr.record_date > dateadd(day, 30, prev.leader_date) then curr.record_date else prev.leader_date end,
        curr.rn
    from ncte as curr
    join rcte as prev on curr.person = prev.person and curr.rn = prev.rn   1
)
select * -- the leader_date could be used to group and number related rows
from rcte
order by 1, 2

Demo on DB<>Fiddle

CodePudding user response:

Agree. Because the disposition of any row potentially depends on the disposition of every prior row (within the same name partition), I believe this can only be done without some form of iteration.

Salman A's solution simulates iteration with a recursive CTE (common table expression). Here is a solution that uses a cursor loop.

DECLARE @Data TABLE (Id INT IDENTITY(1,1), Person VARCHAR(100), Date DATETIME, Expected INT, Actual INT)
INSERT @Data (Person, Date, Expected)
VALUES
    ...

DECLARE @Id INT
DECLARE @Person VARCHAR(100)
DECLARE @Date DATETIME
DECLARE @PriorPerson VARCHAR(100) = NULL
DECLARE @PriorDate DATETIME = NULL
DECLARE @GroupNumber INT

DECLARE Csr CURSOR FOR 
    SELECT Id, Person, Date
    FROM @Data
    ORDER BY Person, Date
OPEN Csr

FETCH NEXT FROM Csr INTO @Id, @Person, @Date
WHILE @@FETCH_STATUS = 0
BEGIN
    IF @PriorPerson IS NULL OR @Person <> @PriorPerson
    BEGIN
        SET @GroupNumber = 1
        SET @PriorPerson = @Person
        SET @PriorDate = @Date
    END
    ELSE IF @Date >= DATEADD(DAY, 31, @PriorDate)
    BEGIN
        SET @GroupNumber = @GroupNumber   1
        SET @PriorDate = @Date
    END
    UPDATE @Data SET Actual = @GroupNumber WHERE Id = @Id

    FETCH NEXT FROM Csr INTO @Id, @Person, @Date
END

CLOSE Csr
DEALLOCATE Csr

SELECT *, Test = CASE WHEN Expected = Actual THEN 'Match' ELSE 'Mismatch' END
FROM @Data
ORDER BY ID

This depends on ID to track the current record. ("UPDATE .. WHERE CURRENT OF Cursor" does work with a cursor with ORDER BY.) If you cannot add an ID, you can update based on name and date.

See this db<>fiddle for a working demo.

  • Related