Home > database >  How optimize select with max subquery on the same table?
How optimize select with max subquery on the same table?

Time:11-17

We have many old selects like this:

SELECT
    tm."ID",tm."R_PERSONES",tm."R_DATASOURCE", ,tm."MATCHCODE",
    d.NAME   AS DATASOURCE,
    p.PDID
FROM TABLE_MAPPINGS tm,
     PERSONES p,
     DATASOURCES d,
     (select ID
      from TABLE_MAPPINGS
      where (R_PERSONES, MATCHCODE) 
            in (select 
                  R_PERSONES, MATCHCODE
                from TABLE_MAPPINGS
                where 
                    id in (select max(id) 
                           from TABLE_MAPPINGS 
                           group by MATCHCODE)
               )
     ) tm2                    
WHERE tm.R_PERSONES = p.ID
  AND tm.R_DATASOURCE=d.ID
  and tm2.id = tm.id;  

These are large tables, and queries take a long time. How to rebuild them?

Thank you

CodePudding user response:

You're querying table_mappings 3 times; how about doing it only once?

WITH
   tab_map
   AS
      (SELECT a.id,
              a.r_persones,
              a.matchcode,
              a.datasource,
              ROW_NUMBER ()
                 OVER (PARTITION BY a.matchcode ORDER BY a.id DESC) rn
         FROM table_mappings a)
SELECT tm.id,
       tm.r_persones,
       tm.matchcode,
       d.name AS datasource,
       p.pdid
  FROM tab_map tm
       JOIN persones p ON p.id = tm.r_persones
       JOIN datasources d ON d.id = tm.r_datasource
 WHERE tm.rn = 1

CodePudding user response:

You can query the table only once using something like (untested as you have not provided a minimal example of your create table statements or sample data):

SELECT *
FROM   (
  SELECT m.*,
         COUNT(CASE WHEN rnk = 1 THEN 1 END)
           OVER (PARTITION BY r_persones, matchcode) AS has_max_id
  FROM   (
    SELECT tm.ID,
           tm.R_PERSONES,
           tm.R_DATASOURCE,
           tm.MATCHCODE,
           d.NAME AS DATASOURCE,
           p.PDID,
           RANK() OVER (PARTITION BY tm.matchcode ORDER BY id DESC) As rnk
    FROM   TABLE_MAPPINGS tm
           INNER JOIN PERSONES p    ON tm.R_PERSONES = p.ID
           INNER JOIN DATASOURCES d ON tm.R_DATASOURCE = d.ID
  ) m
)
WHERE  has_max_id > 0;

First finding the maximum ID using the RANK analytic function and then finding all the relevant r_persones, matchcode pairs using conditional aggregation in a COUNT analytic function.

Note: you want to use the RANK or DENSE_RANK analytic functions to match the maximums as it can match multiple rows per partition; whereas ROW_NUMBER will only ever put a single row per partition first.

  • Related