Suppose I have a table (DeliveryEvent
) like this grouped by delivery_id
, where event_type
2 is an event of 'scheduled', 3 is an event of 'unscheduled', and 4 is an event of 'completed':
id | created | event_type | delivery_id | extra |
---|---|---|---|---|
1 | 2022-10-27 18:04 | 2 | 10005 | |
2 | 2022-10-27 19:00 | 3 | 10005 | {"couldn't deliver"} |
3 | 2022-10-27 19:20 | 2 | 10005 | |
4 | 2022-10-27 20:30 | 3 | 10005 | {"timeout"} |
5 | 2022-10-27 21:15 | 2 | 10005 | |
6 | 2022-10-27 22:40 | 3 | 10005 | {"timeout"} |
7 | 2022-10-27 22:55 | 2 | 10005 | |
8 | 2022-10-27 23:00 | 4 | 10005 |
...
I need a resulting table that for each unscheduled event due to 'timeout' I have information on the scheduled event that occurred just before this timeout, to get the duration between scheduled and unscheduled.
From the below SELECT
I obtain different combinations of scheduled with the other unscheduled events by a timeout:
SELECT
scheduled.id as scheduled_id,
scheduled.created as scheduled_time,
scheduled.event as scheduled_event,
scheduled.delivery_id as delivery_id,
unscheduled.id as unscheduled_id,
unscheduled.created as unscheduled_time,
unscheduled.event as unscheduled_event,
unscheduled.extra as extra
FROM
delivery_event scheduled_event
JOIN
delivery_event unscheduled_event ON scheduled.delivery_id = 10005
AND unscheduled.delivery_id = 10005
AND unscheduled.event = 3
AND scheduled.event = 2
AND scheduled.created < unscheduled.created
AND unscheduled.extra->>'timeout'
scheduled_id | scheduled_time | scheduled_event | delivery_id | unscheduled_id | unscheduled_time | scheduled_event | extra |
---|---|---|---|---|---|---|---|
5 | 2022-10-27 21:15 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
1 | 2022-10-27 18:04 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
1 | 2022-10-27 18:04 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
but instead I'd like to only have the following result:
scheduled_id | scheduled_time | scheduled_event | delivery_id | unscheduled_id | unscheduled_time | scheduled_event | extra |
---|---|---|---|---|---|---|---|
5 | 2022-10-27 21:15 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
only the scheduled events that occurred right before the unscheduled ones, i.e., the last scheduled event before each unscheduled event due to timeout.
CodePudding user response:
Try this :
SELECT a.*
FROM (
SELECT delivery_id
, lag(id, 1) OVER w AS scheduled_id
, lag(created, 1) OVER w AS scheduled_time
, lag(event , 1) OVER w AS scheduled_event
, id AS unscheduled_id
, created AS unscheduled_time
, event AS unscheduled_event
, extra
FROM delivery_event
WHERE delivery_id = 10005
WINDOW w AS (PARTITION BY delivery_id ORDER BY created ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
) AS a
WHERE a.scheduled_event = 2
AND a.unscheduled_event = 3
AND a.extra = '{"timeout"}' :: json
CodePudding user response:
Two solutions I can think of for this. The easiest would probably be to use a distinct on clause in order to drop all rows with duplicate unscheduled_id with an order by clause on scheduled.created in order to ensure you get back the correct row.
SELECT DISTINCT ON (unscheduled.id)
scheduled.id as scheduled_id,
scheduled.created as scheduled_time,
scheduled.event as scheduled_event,
scheduled.delivery_id as delivery_id,
unscheduled.id as unscheduled_id,
unscheduled.created as unscheduled_time,
unscheduled.event as unscheduled_event,
unscheduled.extra as extra
FROM
delivery_event scheduled_event
JOIN
delivery_event unscheduled_event ON scheduled.delivery_id = 10005
AND unscheduled.delivery_id = 10005
AND unscheduled.event = 3
AND scheduled.event = 2
AND scheduled.created < unscheduled.created
AND unscheduled.extra->>'timeout'
ORDER BY scheduled.created DESC
An alternative solution would be to use a left lateral join such that only the latest scheduled event will be joined.
SELECT
scheduled.id as scheduled_id,
scheduled.created as scheduled_time,
scheduled.event as scheduled_event,
scheduled.delivery_id as delivery_id,
unscheduled.id as unscheduled_id,
unscheduled.created as unscheduled_time,
unscheduled.event as unscheduled_event,
unscheduled.extra as extra
FROM
delivery_event unscheduled_event
JOIN
LEFT JOIN LATERAL (
SELECT scheduled.id,
scheduled.created,
scheduled.event,
scheduled.delivery_id
FROM delivery_event scheduled
WHERE scheduled.delivery_id = 10005
AND scheduled.event = 2
AND scheduled.created < unscheduled.created
ORDER BY scheduled.created DESC
LIMIT 1) scheduled ON TRUE
WHERE unscheduled.delivery_id = 10005
AND unscheduled.event = 3
AND unscheduled.extra->>'timeout'
One note of warning on joining tables like this. Although joining with only one relevant row is very convenient it tends to come with significant performances drawbacks, so make sure to check if the performance of either of these methods are satisfactory for your particular circumstances.