I believe my slow query is due to an inner join I'm trying to perform. Instead of:
SELECT ot.email, mt.old_email, mt.new_email
FROM dbo.MainTable AS mt
INNER JOIN dbo.OtherTable AS ot ON ot.email = mt.old_email
OR ot.email = mt.new_email
What would be a more efficient join to run this? Using an Union all statement seems tedious, but maybe I'm wrong. I also tried this but it doesn't seem to work:
SELECT ot.email, mt.old_email, mt.new_email
FROM dbo.MainTable AS mt
INNER JOIN dbo.OtherTable AS ot ON ot.email = (case when mt.old_email !=ot.email THEN mt.new_email)
CodePudding user response:
First, are you sure this is Snowflake and not MS SQL Server?
Regardless, it’s not an efficient query wise to OR the only join condition. I would write as:
SELECT ot.email, mt.old_email, mt.new_email
FROM dbo.MainTable mt,
dbo.OtherTable ot
Where ot.email = mt.old_email
Union
SELECT ot.email, mt.old_email, mt.new_email
FROM dbo.MainTable mt,
dbo.OtherTable ot
Where ot.email = mt.new_email;
CodePudding user response:
Depending on your "data" different blocks of SQL will give different results:
for example I made some data:
with MainTable(old_email,new_email) as (
select * from values
('abc','East'),
('abc','East')
), OtherTable(email) as (
select * from values
('abc'),
('East')
)
your SQL:
SELECT
ot.email
,mt.old_email
,mt.new_email
FROM MainTable AS mt
JOIN OtherTable AS ot
ON ot.email = mt.old_email
OR ot.email = mt.new_email
gives:
OLD_EMAIL | NEW_EMAIL | |
---|---|---|
abc | abc | East |
East | abc | East |
abc | abc | East |
East | abc | East |
which makes sense to me.
Jim's SQL gives:
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt,
OtherTable ot
Where ot.email = mt.old_email
Union
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt,
OtherTable ot
Where ot.email = mt.new_email;
OLD_EMAIL | NEW_EMAIL | |
---|---|---|
abc | abc | East |
East | abc | East |
I would normal say split the logic up, to discreate branches of the OR and use a UNION ALL to avoid the costly deduplication step on a naked UNION.
which for this data changing Jim's UNION to a UNION ALL appears to work:
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt,
OtherTable ot
Where ot.email = mt.old_email
Union ALL
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt,
OtherTable ot
Where ot.email = mt.new_email;
but really we should split it to three blocks, both A & B, just A, just B and union those:
thus:
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
ON ot.email = mt.old_email AND ot.email = mt.new_email
Union ALL
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
ON ot.email = mt.old_email AND ot.email <> mt.new_email
UNION ALL
SELECT ot.email, mt.old_email, mt.new_email
FROM MainTable mt
JOIN OtherTable ot
ON ot.email <> mt.old_email AND ot.email = mt.new_email;
gives:
OLD_EMAIL | NEW_EMAIL | |
---|---|---|
abc | abc | East |
abc | abc | East |
East | abc | East |
East | abc | East |
If you can avoid string comparisons and use number's) that should give you a small speed improvement also, but might require processing that moves the cost elsewhere in you processing stack.