There's two tables: A) all the valid dates in a time frame B) list of events id and respective date in which it occurred.
dt_util |
---|
2022-05-01 |
2022-05-02 |
2022-05-04 |
2022-05-05 |
2022-05-06 |
2022-05-07 |
id_event | dt_event |
---|---|
1 | 2022-05-01 |
2 | 2022-05-01 |
3 | 2022-05-01 |
1 | 2022-05-02 |
3 | 2022-05-02 |
1 | 2022-05-04 |
2 | 2022-05-04 |
3 | 2022-05-04 |
1 | 2022-05-05 |
2 | 2022-05-05 |
1 | 2022-05-07 |
2 | 2022-05-07 |
The desired result is to find the minimum date after the last "missing" (NULL) date for each ID, something like this:
id_event | min_dt_event |
---|---|
1 | 2022-05-07 |
2 | 2022-05-07 |
3 | 2022-05-01 |
I've been trying using FIRST_VALUE and ROW_NUMBER but no luck so far. I know it might be possible going through a loop for both tables, but it's not feasible with the real dataset.
-- Creating temp tables
DROP TABLE IF EXISTS #TEMP_EVENTS
DROP TABLE IF EXISTS #TEMP_DATES
CREATE TABLE #TEMP_EVENTS (
id_event smallint,
dt_event date
)
CREATE TABLE #TEMP_DATES (
dt_util date
)
INSERT INTO #TEMP_EVENTS VALUES
(1, '2022-05-01'),
(2, '2022-05-01'),
(3, '2022-05-01'),
(1, '2022-05-02'),
(3, '2022-05-02'),
(1, '2022-05-04'),
(2, '2022-05-04'),
(3, '2022-05-04'),
(1, '2022-05-05'),
(2, '2022-05-05'),
(1, '2022-05-07'),
(2, '2022-05-07')
INSERT INTO #TEMP_DATES VALUES
('2022-05-01'),
('2022-05-02'),
('2022-05-04'),
('2022-05-05'),
('2022-05-06'),
('2022-05-07')
-- Creating fictional table where all ids matches all dates
DROP TABLE IF EXISTS #TEMP_ID_EVENTS
SELECT DISTINCT ID_EVENT, DT_UTIL
INTO #TEMP_ALL_DATES
FROM #TEMP_EVENTS A
LEFT JOIN #TEMP_DATES B
ON 1 = 1
-- Minimum date after a null date
SELECT
A.ID_EVENT, [DT_UTIL], [DT_EVENT],
FIRST_VALUE([DT_UTIL]) OVER (PARTITION BY A.[ID_EVENT] ORDER BY [DT_EVENT] DESC) AS MIN_DATE
FROM
#TEMP_ALL_DATES A
LEFT JOIN
#TEMP_EVENTS B
ON
A.id_event = B.id_event
AND A.dt_util = B.dt_event
WHERE dt_event IS NULL
CodePudding user response:
The sub-query using row_number()
to generate a running number for dates in #TEMP_DATES
.
Then by joining to that sub-query on date
and using lag()
to check for row_number rn
, it should be different of 1 if there isn't any missing date.
Lastly just get the max
to get the date of the "last missing null"
with cte as
(
select *,
flag = case when d.rn <> 1
and lag(d.rn) over(partition by e.id_event
order by e.dt_event) <> d.rn - 1
then 1
else 0
end
from #TEMP_EVENTS e
inner join
(
select rn = row_number() over (order by dt_util),
dt_util
from #TEMP_DATES
) d on e.dt_event = d.dt_util
)
select id_event,
dt_event = max(case when flag = 1 then dt_event end)
from cte
group by id_event
Result:
id_event | dt_event |
---|---|
1 | 2022-05-07 |
2 | 2022-05-07 |
3 | null |