Home > Software engineering >  Match by id and date between 2 tables, OR last known match id
Match by id and date between 2 tables, OR last known match id

Time:01-13

Trying to make work following:

  • T1: Take id per dt where name = A which is most recent by load_id
    • Notice 2 records on 5-Jan-23, with load_id 2 and 3 => take load_id = 3
  • T2: And display corresponding id per dt for each param rows, with most recent load_id
    • Notice only load_id = 13 is kept on 05-Jan-23
  • T2: In case of date now available in T1, keep T2 rows matching last known id

enter image description here

Fiddle: https://dbfiddle.uk/-JO16GSj

My SQL seems a bit wild. Can it be simplified?

SELECT t2.dt, t2.param, t2.load_id, t2.id FROM
(SELECT 
  dt, 
  param,
  load_id,
  MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id,
  id
FROM table2) t2

LEFT JOIN
(SELECT * FROM
(SELECT
  dt, 
  id,
  load_id,
  MAX(load_id) OVER (PARTITION BY dt) AS max_load_id
FROM table1
WHERE name = 'A') t1_prep
WHERE t1_prep.load_id = t1_prep.max_load_id) t1

ON t1.dt = t2.dt and t1.id = t2.id
  
WHERE t2.load_id = t2.max_load_id

ORDER BY 1, 2

CodePudding user response:

Your query can be rewritten as:

SELECT t2.*
FROM   ( SELECT *
         FROM   table2
         ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
         FETCH FIRST ROW WITH TIES
       ) t2
       LEFT OUTER JOIN
       ( SELECT *
         FROM   table1
         WHERE  name = 'A'
         ORDER BY RANK() OVER (PARTITION BY dt ORDER BY load_id DESC)
         FETCH FIRST ROW WITH TIES
       ) t1
       ON t1.dt = t2.dt and t1.id = t2.id
ORDER BY t2.dt, t2.param

However, since the columns from t1 are never output and are joined with a LEFT OUTER JOIN (and will only output single rows per dt) then it is irrelevant whether a match is found or not with t1 and that table can be eliminated from the query simplifying it to:

SELECT *
FROM   (
  SELECT *
  FROM   table2
  ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
  FETCH FIRST ROW WITH TIES
)
ORDER BY dt, param;

or using your query:

SELECT dt, param, load_id, id
FROM   (
  SELECT dt, param, load_id, id,
         MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id
  FROM   table2
)
WHERE  load_id = max_load_id
ORDER BY dt, param

Which, for the sample data, all output:

DT PARAM LOAD_ID ID
04-JAN-23 0 11 1
04-JAN-23 1 11 1
05-JAN-23 0 13 3
05-JAN-23 1 13 3
06-JAN-23 0 14 3
06-JAN-23 1 14 3
07-JAN-23 1 14 3
08-JAN-23 1 15 3
09-JAN-23 0 16 3
09-JAN-23 1 16 3
10-JAN-23 0 17 3
10-JAN-23 1 17 3

fiddle

  • Related