Home > Software design >  Left join on single record
Left join on single record

Time:09-20

I have the next table that stores events: (simplified structure)

ID User Action Timestamp
12 user1 END 2022-01-01 05:00
43 user1 START 2022-01-01 04:00
54 user1 END 2022-01-01 03:00
13 user1 START 2022-01-01 02:00

I need to join 2 events in one row, so any START event is accompanied by the END event that comes after that.

So the result should be the next:

ID1 ID2 User Start Timestamp End Timestamp
13 54 user1 2022-01-01 02:00 2022-01-01 03:00
43 12 user1 2022-01-01 04:00 2022-01-01 05:00

Ideally, it should not have to many performance issues, as there could be a lot of records in the table.

I've tried the next query:

select 
  s.id as "ID1",
  e.id as "ID2",
  s.user,
  s.time as "Start Time",
  e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
  and e.action = 'END'
  and s.timestamp < e.timestamp

but it will also match the record 13 to record 12. Is it possible to join the left side to right only once? (keeping in mind that is should be the next END record time-wise?

Thanks

CodePudding user response:

We want to get the nearest timestamp of the END event for each START event.

I would go with the following approach:

  1. Get the minimum greater than zero timestamp difference for each of the START events.
  2. Now find the actual END event using the timedelta.

Assumptions

  1. At max we can have only one event which is not ended yet!
  2. For every START event, the timestamps will be unique. (Same goes for END event.
WITH closest_to_start AS (
    SELECT 
        s.id,
        MIN(TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp)) AS min_delta
    FROM Events AS s
    INNER JOIN Events AS e ON s.user = e.user
    WHERE s.action = 'START'
    AND e.action = 'END'
    GROUP BY s.id
    HAVING min_delta >= 0
)
SELECT s.id, 
    e.id
FROM Events AS s
OUTER JOIN closest_to_start ON closest_to_start.id = s.id
OUTER JOIN Events AS e ON e.id = s.id
WHERE s.action = 'START'
AND e.action = 'END'
AND 
(
    e.timestamp IS NULL
    OR
    TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp) = closest_to_start.min_delta
)

CodePudding user response:

Here is a PostgreSQL solution using lateral join. It might be working on HANA as no Postgres-specific features are used. The internal query selects the 'END' action for the same user that occurred soonest after the corresponding 'START'. Events that have started but not finished yet will have NULL values for "ID2" and "End timestamp".

create temporary table the_table(id integer, usr text, action text, ts timestamp);
insert into the_table values 
(12,'user1','END','2022-01-01 05:00'),(43,'user1','START','2022-01-01 04:00'),
(54,'user1','END','2022-01-01 03:00'),(13,'user1','START','2022-01-01 02:00');

select tx.id as "ID1", l.id as "ID2", tx.usr as "User", 
       tx.ts as "Start timestamp", l.ts as "End timestamp" 
from the_table as tx
left join lateral 
(
  select ti.id, ti.ts
  from the_table as ti 
  where ti.action = 'END' 
    and ti.ts > tx.ts 
    and ti.usr = tx.usr
  order by ti.ts - tx.ts
  limit 1
) as l on true  
where tx.action = 'START'
order by "Start timestamp"; 

CodePudding user response:

You can use the window function Lead.

with Daten 
as
(
Select 12 as ID, 'user1' as Benutzer, 'END' as action, '05:00' as Time
Union
Select 43 as ID, 'user1' as Benutzer, 'Start' as action, '04:00' as Time
Union
Select 54 as ID, 'user1' as Benutzer, 'END' as action, '03:00' as Time
Union
Select 13 as ID, 'user1' as Benutzer, 'Start' as action, '02:00' as Time
)
Select 
    *
from
    (
        Select
            *,
            lead(ID,1) over (order by number) as ID2,
            lead(action,1) over (order by number) as action2,
            lead(time,1) over (order by number) as time2
        from
            (
                Select 
                    *,
                    ROW_NUMBER() OVER(ORDER BY Benutzer,Time,action) as number
                from
                     Daten
            ) x
    ) y
where y.action = 'Start'

CodePudding user response:

The issue with your query above is that for each start event, there can be multiple end events, which occur after. However, you would like to choose the one that's 'closest' to the start event. You can achieve this by adding an additional aggregation.

Please find a HANA example (uses no HANA specific functionality):

SELECT
    S.ID ID1,
    S.USER,
    S.ACTION,
    S.TIMESTAMP START_TIME,
    MIN(E.TIMESTAMP) END_TIME
FROM TEST S
JOIN TEST E ON (
    s.USER = e.USER AND
    s.ACTION = 'START' AND
    e.ACTION = 'END' AND
    e.TIMESTAMP >= s.TIMESTAMP
)
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP

If you need to have E.ID included, you will need to join it back to the result set. Note, that there may be multiple end events with the same timestamp, which you need to handle when joining back E.ID.

If you additionally would like to include START events without corresponding END event, you can use the following:

INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')

SELECT
    S.ID ID1,
    S.USER,
    S.ACTION,
    S.TIMESTAMP START_TIME,
    MIN(E.TIMESTAMP) END_TIME
FROM TEST S
LEFT JOIN TEST E ON (
    s.USER = e.USER AND
    e.ACTION = 'END' AND
    e.TIMESTAMP >= s.TIMESTAMP
)
WHERE s.ACTION ='START'
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP

CodePudding user response:

One way is a lateral join that picks the smallest "end" timestamp that is greater than the "start" timestamp:

select st.id as id1,
       en.id as id2,
       st."timestamp" as start_timestamp,
       en."timestamp" as end_timestamp
from events st
  left join lateral (
      select id, "timestamp"
      from events e
      where e."user" = st."user" 
        and e.action = 'END'
      and e.timestamp >= st.timestamp
      order by "timestamp" 
      fetch first 1 row only
  ) en on true
where st.action = 'START';

The above is standard ANSI SQL and works (at least) in Postgres.

In Postgres I would create an index on events ("user", "timestamp") where action = 'END' to make the lateral query fast.

  • Related