TSQL: Selecting parent based on conditions on child-CodePudding

I have a parent table Orders and a Child table Jobs with the following sample data

I want to select Orders based on the following requirements

1>For each order there may be 0 or more jobs. Do not select order if it does not have any job.
2>A user cannot work on more than one job that belongs to the same order.
For example User 1 cannot work on the Jobs that belongs to Order 1 and 2 because he already worked on jobs 1 and 4 from the same order.
3>Only select orders that have jobs in Requested status

I have the following query that gives me expected result

DECLARE @UserID INT = 2

SELECT O.OrderID
FROM Orders O
JOIN Jobs J ON J.OrderID = O.OrderID
WHERE 
J.JobStatus = 'Requested' AND
NOT EXISTS
(  
    --Must not have worked this Order
    SELECT 1 FROM Jobs J1
    WHERE J1.OrderID = O.OrderID AND J1.UserID = @UserID
)
Group By o.OrderID

SQL Fiddle Demo

Query joins the Jobs table twice. I am trying to optimize query and looking for a way to achieve the expected result by using Jobs table only once if possible. Any other solution is also appreciated. I can alter the table schema if required.

The jobs table has almost 20M rows and some time query shows poor performance. (Yes, we looked at indexes). I think its scanning jobs table twice causing the performance issue.

CodePudding user response：

You might consider adding the following index to the Jobs table:

CREATE INDEX idx_jobs ON Jobs (OrderID, UserID, JobStatus);

This index, if used, should speed up the not exists subquery in your query above. Also, it can be used for the join from Orders to Jobs in the outer top level query (although SQL Server would probably have to do an index scan).

CodePudding user response：

If the aim is just to "use Jobs table only once", I would try something like:

DECLARE @UserID INT = 2
    
SELECT 
    O.OrderID
FROM 
    Orders O
    INNER JOIN Jobs J ON J.OrderID = O.OrderID  
GROUP BY
    O.OrderID
HAVING
    SUM(CASE WHEN J.JobStatus = 'Requested' THEN 1 ELSE 0 END) > 0
    AND SUM(CASE WHEN J.UserID = @UserId THEN 1 ELSE 0 END) = 0

SQL Fiddle

To optimize further, I would suggest replacing the varchar datatype of the JobStatus column with tinyint one (and create a JobStatuses table). Once your Job table has 20M records, then varchar(10) gives you 190 Mb, however using the tinyint reduces the column size to 19 Mb — this gives you less IO-Read operations.

And I would try to separate the child filtering from joining it with the parents. Such approach may use less memory for a single operation and win in performance because of that:

DECLARE @UserID INT = 2
DECLARE @OrderIDs TABLE (OrderID INT NOT NULL PRIMARY KEY)

INSERT INTO @OrderIDs
SELECT
    OrderID
FROM
    Jobs
GROUP BY
    OrderID
HAVING
    SUM(CASE WHEN JobStatus = 'Requested' THEN 1 ELSE 0 END) > 0
    AND SUM(CASE WHEN UserID = @UserId THEN 1 ELSE 0 END) = 0

SELECT
    O.*
FROM
    Orders O
    INNER JOIN @OrderIDs ids on ids.OrderID = O.OrderID