So I have a list as follows:
Table 1
ID TIMESTAMP GROUP
001 2021-04-01 12:51:12.063 A
001 2021-04-04 12:51:12.063 G
001 2021-04-14 10:47:03.022 B
002 2021-01-13 09:46:23.012 C
003 2021-09-10 03:32:53.043 D
004 2021-04-13 01:12:54.056 D
004 2021-04-13 11:12:26.054 A
004 2021-04-13 21:53:36.023 D
005 2021-04-01 13:53:13.023 F
005 2021-04-11 13:53:13.023 J
003 2022-04-13 20:32:11.011 G
006 2021-08-13 20:32:11.011 G
And I also have a list of events:
TABLE 2
EVENT ID TIMESTAMP
eventA 001 2021-04-02 12:51:12.063
eventB 001 2021-04-13 12:51:12.063
eventA 002 2021-04-01 12:51:12.063
eventA 002 2021-04-13 12:51:12.063
eventA 002 2021-04-14 12:51:12.063
eventA 003 2021-10-17 12:51:12.063
eventB 005 2021-04-10 12:51:12.063
eventB 005 2021-04-21 12:51:12.063
eventA 006 2021-05-01 20:32:11.011
And my goal here is for every event in TABLE 2, I want to join the most recent entry from table 1 based on ID. If there are no preceding entries in Table 1, though they exist, they should be null on the join.
So in short, for every row in Table 2, we need to find the most recent group for that ID based on timestamp.
Final Result
EVENT ID TIMESTAMP group
eventA 001 2021-04-02 12:51:12.063 A
eventB 001 2021-04-13 12:51:12.063 G
eventA 002 2021-04-01 12:51:12.063 NULL
eventA 002 2021-04-13 12:51:12.063 C
eventA 002 2021-04-14 12:51:12.063 C
eventA 003 2021-10-17 12:51:12.063 D
eventB 005 2021-04-10 12:51:12.063 F
eventB 005 2021-04-21 12:51:12.063 J
eventA 006 2021-05-01 20:32:11.011 NULL
CodePudding user response:
So if you do a LEFT JOIN based on prior (equal?) timestamps and then prune the overmatches to just the most recent with a QUALIFY this can be done with:
SELECT t2.event
t2.id
t2.timestamp
t1.group
FROM table2 AS t2
LEFT JOIN table1 AS t1
ON t2.id = t1.id AND t2.timestamp >= t1.timestamp
QUALIFY ROW_NUMBER() OVER (
PARTITON BY t2.id, t2.timestamp
ORDER BY t1.timestamp DESC NULLS LAST
) = 1
ORDER BY 1,2,3;
this will work as long as Table2 has no duplicate ID, Timestamp values
CodePudding user response:
Window functions with QUALIFY ROW_NUMBER() work to get the latest row as Simeon shows. I've found that for this type of join (often called an AsOf join) if the tables are very large this join, find the max timestamp and rejoin approach usually completes faster than using a window function:
select J."EVENT", J.ID, J."TIMESTAMP", "GROUP" from
(select * from T2,
lateral (select max(T1."TIMESTAMP") TS from T1 where T1.ID = T2.ID and T1.TIMESTAMP < T2."TIMESTAMP")) J
left join T1 on J.TS = T1."TIMESTAMP"
;