Is it optimal to use multiple joins in update query?-CodePudding

My update query checks whether the column “Houses” is null in any of the rows in my source table by joining the id between the target & source table (check query one). The column Houses being null in this case indicates that the row has expired; thus, I need to expire the row id in my target table and set the expired date. The query works fine, but I was wondering if it can be improved; I'm new to SQL, so I don't know if using two joins is the best way to accomplish the result I want. My update query will later be used against millions of rows. No columns has been indexed yet.

Query:


(Query one)
Update t
set valid_date = GETDATE()
From Target T
JOIN SOURCE  S ON S.ID = T.ID
LEFT JOIN SOURCE S2 ON S2.Houses = t.Houses
WHERE S2.Houses is null

Target:

ID	namn	middlename	Houses	date
1	demo	hello	2	null
2	demo2	test	4	null
3	demo3	test1	5	null

Source:

ID	namn	middlename	Houses
1	demo	hello	null
3	demo	world	null

Expected output after running update query :

ID	namn	middlename	Houses	date
1	demo	hello	2	2022-12-06
2	demo2	test	4	null
3	demo3	test1	5	2022-12-06

CodePudding user response：

I would recommend exists:

update t
set valid_date = getdate()
from target t
where exists (select 1 from source s where s.id = t.id and s.houses is null)

Note that your original query does not exactly do what you want. It cannot distinguish source rows that do not exist from source rows that exist and whose houses column is null. In your example, it would update row 2, which is not what you seem to want. You would need an INNER JOIN instead of the LEFT JOIN.

With EXISTS, you want an index on source(id, houses) so the subquery can execute efficiently against target rows. This index is probably worthwhile for the the JOIN as well.

CodePudding user response：

I don't see why you'd need to join on the column houses at all.

Find all rows in source that have value NULL in the column houses. Then update all rows in target that have the IDs of the source rows.

I prefer to write these kind of complex updates using CTEs. It looks more readable to me.

WITH
CTE
AS
(
    SELECT
        Target.ID
        ,Target.Date
    FROM
        Source
        INNER JOIN Target ON Target.ID = Source.ID
    WHERE Source.Houses IS NULL 

)
UPDATE CTE
SET Date = GETDATE();

To efficiently find rows in source that have value NULL in the column houses you should create an index, something like this:

CREATE INDEX IX_Houses ON Source
(
   Houses
);

I assume that ID is a primary key with a clustered unique index, so ID would be included in the IX_Houses index implicitly.