Home > Back-end >  Finding the first occurrence after a null value for each id
Finding the first occurrence after a null value for each id

Time:05-05

There's two tables: A) all the valid dates in a time frame B) list of events id and respective date in which it occurred.

dt_util
2022-05-01
2022-05-02
2022-05-04
2022-05-05
2022-05-06
2022-05-07
id_event dt_event
1 2022-05-01
2 2022-05-01
3 2022-05-01
1 2022-05-02
3 2022-05-02
1 2022-05-04
2 2022-05-04
3 2022-05-04
1 2022-05-05
2 2022-05-05
1 2022-05-07
2 2022-05-07

The desired result is to find the minimum date after the last "missing" (NULL) date for each ID, something like this:

id_event min_dt_event
1 2022-05-07
2 2022-05-07
3 2022-05-01

I've been trying using FIRST_VALUE and ROW_NUMBER but no luck so far. I know it might be possible going through a loop for both tables, but it's not feasible with the real dataset.

    -- Creating temp tables
    DROP TABLE IF EXISTS #TEMP_EVENTS
    DROP TABLE IF EXISTS #TEMP_DATES
    CREATE TABLE #TEMP_EVENTS (
        id_event smallint,
        dt_event date
    )
    
    CREATE TABLE #TEMP_DATES (
        dt_util date
    )

    INSERT INTO #TEMP_EVENTS VALUES 
            (1, '2022-05-01'),
            (2, '2022-05-01'),
            (3, '2022-05-01'),
            (1, '2022-05-02'),
            (3, '2022-05-02'),
            (1, '2022-05-04'),
            (2, '2022-05-04'),
            (3, '2022-05-04'),
            (1, '2022-05-05'),
            (2, '2022-05-05'),
            (1, '2022-05-07'),
            (2, '2022-05-07')
            
    INSERT INTO #TEMP_DATES VALUES 
            ('2022-05-01'),
            ('2022-05-02'),
            ('2022-05-04'),
            ('2022-05-05'),
            ('2022-05-06'),
            ('2022-05-07')

    -- Creating fictional table where all ids matches all dates
    DROP TABLE IF EXISTS #TEMP_ID_EVENTS
    SELECT DISTINCT ID_EVENT, DT_UTIL 
    INTO #TEMP_ALL_DATES 
    FROM #TEMP_EVENTS A 
    LEFT JOIN #TEMP_DATES B
    ON 1 = 1
    
    -- Minimum date after a null date
    SELECT 
         A.ID_EVENT, [DT_UTIL], [DT_EVENT],
         FIRST_VALUE([DT_UTIL]) OVER (PARTITION BY A.[ID_EVENT] ORDER BY [DT_EVENT] DESC) AS MIN_DATE
    FROM 
        #TEMP_ALL_DATES A
    LEFT JOIN 
        #TEMP_EVENTS B
    ON  
        A.id_event = B.id_event
    AND A.dt_util  = B.dt_event
    WHERE dt_event IS NULL

CodePudding user response:

The sub-query using row_number() to generate a running number for dates in #TEMP_DATES.

Then by joining to that sub-query on date and using lag() to check for row_number rn, it should be different of 1 if there isn't any missing date.

Lastly just get the max to get the date of the "last missing null"

with cte as
(
    select *,
           flag = case when d.rn <> 1 
                       and  lag(d.rn) over(partition by e.id_event 
                                               order by e.dt_event) <> d.rn - 1
                       then 1
                       else 0
                       end
    from   #TEMP_EVENTS e
           inner join
           (
               select rn = row_number() over (order by dt_util),
                      dt_util
               from   #TEMP_DATES
           ) d             on e.dt_event = d.dt_util
)
select id_event, 
       dt_event = max(case when flag = 1 then dt_event end)
from   cte
group by id_event

Result:

id_event dt_event
1 2022-05-07
2 2022-05-07
3 null

db<>fiddle demo

  •  Tags:  
  • tsql
  • Related