Home > front end >  Slow query performance and Inner join
Slow query performance and Inner join

Time:08-08

I believe my slow query is due to an inner join I'm trying to perform. Instead of:

SELECT ot.email, mt.old_email, mt.new_email
  FROM dbo.MainTable AS mt
  INNER JOIN dbo.OtherTable AS ot ON ot.email = mt.old_email
                                  OR ot.email = mt.new_email

What would be a more efficient join to run this? Using an Union all statement seems tedious, but maybe I'm wrong. I also tried this but it doesn't seem to work:

SELECT ot.email, mt.old_email, mt.new_email
  FROM dbo.MainTable AS mt
  INNER JOIN dbo.OtherTable AS ot ON ot.email = (case  when mt.old_email !=ot.email THEN mt.new_email)

CodePudding user response:

First, are you sure this is Snowflake and not MS SQL Server?

Regardless, it’s not an efficient query wise to OR the only join condition. I would write as:

SELECT ot.email, mt.old_email, mt.new_email
  FROM dbo.MainTable mt, 
       dbo.OtherTable ot
  Where  ot.email = mt.old_email 
Union
SELECT ot.email, mt.old_email, mt.new_email
  FROM dbo.MainTable  mt,
       dbo.OtherTable ot 
  Where ot.email = mt.new_email;
       

CodePudding user response:

Depending on your "data" different blocks of SQL will give different results:

for example I made some data:

with MainTable(old_email,new_email) as (
    select * from values
        ('abc','East'),
        ('abc','East')
), OtherTable(email) as (
    select * from values
        ('abc'),
        ('East')
)

your SQL:

SELECT 
    ot.email
    ,mt.old_email
    ,mt.new_email
  FROM MainTable AS mt
  JOIN OtherTable AS ot
    ON ot.email = mt.old_email
        OR ot.email = mt.new_email

gives:

EMAIL OLD_EMAIL NEW_EMAIL
abc abc East
East abc East
abc abc East
East abc East

which makes sense to me.

Jim's SQL gives:

SELECT ot.email, mt.old_email, mt.new_email
  FROM MainTable mt, 
       OtherTable ot
  Where  ot.email = mt.old_email 
Union
SELECT ot.email, mt.old_email, mt.new_email
  FROM MainTable  mt,
       OtherTable ot 
  Where ot.email = mt.new_email;
EMAIL OLD_EMAIL NEW_EMAIL
abc abc East
East abc East

I would normal say split the logic up, to discreate branches of the OR and use a UNION ALL to avoid the costly deduplication step on a naked UNION.

which for this data changing Jim's UNION to a UNION ALL appears to work:

SELECT ot.email, mt.old_email, mt.new_email
  FROM MainTable mt, 
       OtherTable ot
  Where  ot.email = mt.old_email 
Union ALL
SELECT ot.email, mt.old_email, mt.new_email
  FROM MainTable  mt,
       OtherTable ot 
  Where ot.email = mt.new_email;

but really we should split it to three blocks, both A & B, just A, just B and union those:

thus:

SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
    ON  ot.email = mt.old_email AND ot.email = mt.new_email
Union ALL
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
    ON  ot.email = mt.old_email AND ot.email <> mt.new_email
UNION ALL
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
    ON  ot.email <> mt.old_email AND ot.email = mt.new_email;

gives:

EMAIL OLD_EMAIL NEW_EMAIL
abc abc East
abc abc East
East abc East
East abc East

If you can avoid string comparisons and use number's) that should give you a small speed improvement also, but might require processing that moves the cost elsewhere in you processing stack.

  • Related