Home > Software engineering >  Replace multiple LEFT JOINS on the same table
Replace multiple LEFT JOINS on the same table

Time:04-26

I have a SQL query that consists of multiple left joins to the same table and has a very bad performance. The only difference is that each left join searches for a different feature in the table, which is then set as a new column.

SELECT
    t0.*,
    m1.value        AS "Bestellnummer"
    m2.value        AS "Bestellposition",
    m3.value        AS "Aussteller",
    m4.value        AS "Baugruppe",
    m5.value        AS "Artikelnummer",
    m6.value        AS "Zertifikatsnummer",
    m7.value        AS "Arbeitspaket",
    m8.value        AS "Chargennummer",
    m9.value        AS "Systemnummer",
    m10.value       AS "Armaturnummer" 
    
FROM        table_1 t0

LEFT JOIN   table_2 m1  ON (m1.dokument_id = t0.id AND m1.merkmal = 'Bestellnummer') 
LEFT JOIN   table_2 m2  ON (m2.dokument_id = t0.id AND m2.merkmal = 'Bestellposition')
LEFT JOIN   table_2 m3  ON (m3.dokument_id = t0.id AND m3.merkmal = 'Aussteller')
LEFT JOIN   table_2 m4  ON (m4.dokument_id = t0.id AND m4.merkmal = 'Baugruppe')
LEFT JOIN   table_2 m5  ON (m5.dokument_id = t0.id AND m5.merkmal = 'Artikelnummer')
LEFT JOIN   table_2 m6  ON (m6.dokument_id = t0.id AND m6.merkmal = 'Zertifikatsnummer')
LEFT JOIN   table_2 m7  ON (m7.dokument_id = t0.id AND m7.merkmal = 'Arbeitspaket')
LEFT JOIN   table_2 m8  ON (m8.dokument_id = t0.id AND m8.merkmal = 'Chargennummer')
LEFT JOIN   table_2 m9  ON (m9.dokument_id = t0.id AND m9.merkmal = 'Systemnummer')
LEFT JOIN   table_2 m10 ON (m10.dokument_id = t0.id AND m10.merkmal = 'Armaturnummer')

I have already tried to replace all the left joins with the following statement:

SELECT
    t0.id, 
    t0.document_id,
    t0.vault_id, 
    t0.document_type_id, 
    t0.state,  
    t0.name, 
    to_number(to_char(unixts_to_date(t0.file_create_date), 'YYYYMMDD')) AS file_create_date,
    to_number(to_char(unixts_to_date(t0.file_change_date), 'YYYYMMDD')) AS file_change_date,    
    
    t1."Bestellnummer",
    t1."Aussteller",
    t1."Baugruppe",
    t1."Artikelnummer",
    t1."Zertifikatsnummer",
    t1."Arbeitspaket",
    t1."Chargennummer",
    t1."Systemnummer",
    t1."Armaturnummer"
    
FROM    table_1 t0
LEFT JOIN 
    (
        SELECT 
            dokument_id,
            max(case when merkmal = 'Bestellnummer' then value else null end )      as "Bestellnummer",
            max(case when merkmal = 'Bestellposition' then value else null end )    as "Bestellposition",
            max(case when merkmal = 'Aussteller' then value else null end )         as "Aussteller",
            max(case when merkmal = 'Baugruppe' then value else null end )          as "Baugruppe",
            max(case when merkmal = 'Artikelnummer' then value else null end )      as "Artikelnummer",
            max(case when merkmal = 'Zertifikatsnummer' then value else null end )  as "Zertifikatsnummer",
            max(case when merkmal = 'Arbeitspaket' then value else null end )       as "Arbeitspaket",
            max(case when merkmal = 'Chargennummer' then value else null end )      as "Chargennummer",
            max(case when merkmal = 'Systemnummer' then value else null end )       as "Systemnummer",
            max(case when merkmal = 'Armaturnummer' then value else null end )      as "Armaturnummer"    
       
        FROM 
            table_2
            
        GROUP BY dokument_id

    )  t1   on t1.dokument_id = t0.id 

But this does not work, since a document of the table tabe_1 for example can have 2 times the characteristic 'article number' with different values.

Does anyone have an idea?

Thank you!

CodePudding user response:

Ouch. It looks like someone in your organization has unfortunately "discovered" the Entity-Attribute-Value data model, which promises flexibility but is hardly scalable at all. I don't know if there is an easy trick, and you did not provide useful details like an explain plan output.

Your problem is made worse by the fact that you have no WHERE condition on table_1. So, guessing here, you are probably looking at a HASH_JOIN operation using table_1 as the "build table"[1] and then 9 loops using table_2 as the "probe table". Worse still, it's likely that the build table from table_1 does not fit in PGA memory, making the whole process way less efficient.

First, you should have indexes on table_1(id) and table_2(dokument_id, merkmal, value). Including value in the table_2 index can be very helpful in a situation like this, because it means Oracle never has to read the blocks from table_2 -- everything it needs from that table is already in the index (this is called a "covering index"). That index is likely to be smaller than the table.

Second, if that doesn't help, consider restructuring table_1 as an Index-Organized Table. That, along with the covering index on table_2 might make a sort-merge join viable.

No guarantees here. You have a bad data model here for the query you are trying to execute. Clever SQL can only go so far. The real answer is to fix your data model. Move each the attribute value from table_2 into a new column on table_1.

P.S. If you do have a WHERE clause on table_1 and just omitted it from your question for some reason, you should update your question because it changes the answer a lot.

P.P.S. If you are still stuck, update your question with DBMS_XPLAN output with statistics.

CodePudding user response:

Have you considered making a temp table and then doing updates to it utilizing the conditions in your left joins on each update statement, only as inner joins?

  • Related