Home > Back-end >  Calculate Rank Based on Shared Column Values and Consecutive Date Ranges (same rank for records with
Calculate Rank Based on Shared Column Values and Consecutive Date Ranges (same rank for records with

Time:12-09

I am trying to get the rank of a table that has specific id's and a start and end date for each record, as such:

id1 id2 flag startdate enddate
1 1 y 2007-01-10 2007-02-12
1 1 y 2007-02-13 2007-08-04
1 1 y 2007-08-05 2008-10-04
1 1 n 2008-10-05 2008-11-14
1 1 n 2008-11-15 2008-12-02
1 1 n 2008-12-08 2008-12-20
2 2 y 2012-01-10 2012-02-12
2 2 y 2012-02-13 2012-08-04
2 3 y 2012-01-10 2012-02-14
2 4 y 2012-08-14 2013-01-10
2 4 y 2013-01-15 2013-01-26
2 4 y 2013-01-27 2013-02-04
2 4 n 2016-03-14 2016-04-12

Where I essentially want to give the same count value to all records which share the same id1, id2, and flag, and are consecutive in their dates. Consecutive, meaning the start date of one record is equal to the end date of the previous record 1 day. The desired output should look like:

id1 id2 flag startdate enddate rank_t
1 1 y 2007-01-10 2007-02-12 1
1 1 y 2007-02-13 2007-08-04 1
1 1 y 2007-08-05 2008-10-04 1
1 1 n 2008-10-05 2008-11-14 2
1 1 n 2008-11-15 2008-12-02 2
1 1 n 2008-12-08 2008-12-20 3
2 2 y 2012-01-10 2012-02-12 4
2 2 y 2012-02-13 2012-08-04 4
2 3 y 2012-01-10 2012-02-14 5
2 4 y 2012-08-14 2013-01-10 6
2 4 y 2013-01-15 2013-01-26 7
2 4 y 2013-01-27 2013-02-04 7
2 4 n 2016-03-14 2016-04-12 8

The output or rank does not have to be in that exact order, but the idea is still the same. Records which share the same id1, id2, and flag, and are consecutive in their dates should all have the same rank. And that rank value should not be used again for any other 'group' of records.

Here is the code to generate a temp table with this structure:

if object_id('tempdb..#temp1') is not null drop table #temp1
CREATE TABLE #temp1 (id1 INT, id2 int, flag varchar(10), startdate DATETIME, enddate DATETIME)
INSERT INTO #temp1 values
(1, 1, 'y', '2007-01-10', '2007-02-12'),
(1, 1, 'y', '2007-02-13', '2007-08-04'),
(1, 1,'y', '2007-08-05', '2008-10-04'),
(1, 1,'n', '2008-10-05', '2008-11-14'),
(1, 1,'n', '2008-11-15', '2008-12-02'),
(1, 1,'n', '2008-12-08', '2008-12-20'),
(2, 2,'y', '2012-01-10', '2012-02-12'),
(2, 2,'y', '2012-02-13', '2012-08-04'),
(2, 3,'y', '2012-01-10', '2012-02-14'),
(2, 4,'y', '2012-08-14', '2013-01-10'),
(2, 4,'y', '2013-01-15', '2013-01-26'),
(2, 4,'y', '2013-01-27', '2013-02-04'),
(2, 4,'n', '2016-03-14', '2016-04-12')

Thanks in advance for any help.

CodePudding user response:

Try the following:

WITH T AS 
(
  SELECT *,
    IIF(
        DATEDIFF(DAY, 
                 LAG(enddate, 1, startdate-1) OVER (PARTITION BY id1, id2 ORDER BY enddate),
                 startdate)=1, 
        0, 1) AS chk
  FROM #temp1
)
SELECT id1, id2, flag, startdate, enddate, 
  DENSE_RANK() OVER (ORDER BY id1, id2, flag DESC, grp) AS rank_t
FROM
(
  SELECT *, 
  SUM(CHK) OVER (PARTITION BY id1, id2 ORDER BY enddate) AS grp 
  FROM T
) Groups
ORDER BY id1, id2, startdate

See a demo.

In the CTE T, we check if the date difference between the start date and the previous end date is equal to one, and set a value of zero if that condition is met and a value of one if that condition is not met.

Now, using a running sum of the chk value populated in the CTE, we can define unique groups for the consecutive rows.

At the end, we used DENSE_RANK function using the order (id1, id2, flag DESC, grp) to generate the required ranks.

CodePudding user response:

Same logic as existing answer... just done as 2 CTEs (which I find clearer) than a combination of CTE Sub-query.

with cte1 as (
    select *
        --  Identify if there is a gap between the current startdate and the previous enddate
        , case when lag(enddate,1,dateadd(day,-1,startdate)) over (partition by id1, id2, flag order by startdate asc) = dateadd(day,-1,startdate) then 0 else 1 end DateGap
    from #temp1
), cte2 as (
    select *
        -- Sum every time a gap is detected to generate a new partition
        , sum(DateGap) over (order by startdate asc) DateGapSum
    from cte1
)
select id1, id2, flag, startdate, enddate
    -- Use dense_rank to generate the ranking where ties are allocated the same value
    , dense_rank() over (order by id1 asc, id2 asc, flag desc, DateGapSum asc) rank_t
from cte2
order by id1, id2, startdate;
  • Related