Home > Mobile >  Joining on closest timestamp? Presto SQL
Joining on closest timestamp? Presto SQL

Time:04-22

I have 2 tables with epoch values. One with multiple samples per minute such as:

id First_name epoch_time
1 Paul 1650317420
2 Jeff 1650317443
3 Raul 1650317455

And one with 1 sample per minute:

id Home epoch_time
1 New York 1650317432

What I would like to do is join on the closest timestamp between the two tables. Ideally, finding the closest values between tables 1 and 2 and then populating a field from table 1 and 2. Id like to populate the 'Home' field and keep the rest of the records from table 1 as is, such as:

id Name Home epoch_time
1 Paul New York 1650317420
2 Jeff New York 1650317443
3 Raul New York 1650317455

The problem is the actual join. The ID is not unique hence why I need to not only join on ID but also scan for the closest epoch time between the 2 tables. I cannot use correlated subqueries, since Presto doesn't support correlated subqueries.

CodePudding user response:

Answered my own question. It was as simple as first adding some offset such as a LEAD() between each minute sample and then using a BETWEEN in the join between the tables on the current minute sample looking ahead 59 seconds. Such that:

WITH tbl1 AS (

    SELECT
        *
    FROM table_1

),

tbl2 AS (

    SELECT
        *,
        LEAD(epoch_time) OVER (
            PARTITION BY
                name,
                home
            ORDER BY
                epoch_time
        ) - 1 AS next_time
    FROM table_2
)

SELECT
    t1.Id,
    t1.Name,
    t2.Home,
    t1.epoch_time
FROM tbl1 t1
LEFT JOIN tbl2 t2
    ON t1.Id = t2.Id
    AND t1.epoch_time BETWEEN t2.epoch_time AND t2.next_time
  • Related