I have 2 tables with epoch values. One with multiple samples per minute such as:
id | First_name | epoch_time |
---|---|---|
1 | Paul | 1650317420 |
2 | Jeff | 1650317443 |
3 | Raul | 1650317455 |
And one with 1 sample per minute:
id | Home | epoch_time |
---|---|---|
1 | New York | 1650317432 |
What I would like to do is join on the closest timestamp between the two tables. Ideally, finding the closest values between tables 1 and 2 and then populating a field from table 1 and 2. Id like to populate the 'Home' field and keep the rest of the records from table 1 as is, such as:
id | Name | Home | epoch_time |
---|---|---|---|
1 | Paul | New York | 1650317420 |
2 | Jeff | New York | 1650317443 |
3 | Raul | New York | 1650317455 |
The problem is the actual join. The ID is not unique hence why I need to not only join on ID but also scan for the closest epoch time between the 2 tables. I cannot use correlated subqueries, since Presto doesn't support correlated subqueries.
CodePudding user response:
Answered my own question. It was as simple as first adding some offset such as a LEAD() between each minute sample and then using a BETWEEN in the join between the tables on the current minute sample looking ahead 59 seconds. Such that:
WITH tbl1 AS (
SELECT
*
FROM table_1
),
tbl2 AS (
SELECT
*,
LEAD(epoch_time) OVER (
PARTITION BY
name,
home
ORDER BY
epoch_time
) - 1 AS next_time
FROM table_2
)
SELECT
t1.Id,
t1.Name,
t2.Home,
t1.epoch_time
FROM tbl1 t1
LEFT JOIN tbl2 t2
ON t1.Id = t2.Id
AND t1.epoch_time BETWEEN t2.epoch_time AND t2.next_time