I have a table with purchase orders, the orders' lines and a code for each line
Order_ID | LINE | CODE |
---|---|---|
A0001 | 1 | aaaa |
A0002 | 1 | bbbb |
A0002 | 2 | xxxx |
A0003 | 1 | cccc |
A0004 | 1 | xxxx |
A0004 | 2 | dddd |
And I need to filter out all the Orders that have at least one line with the code 'xxxx':
Order_ID | LINE | CODE |
---|---|---|
A0001 | 1 | aaaa |
A0003 | 1 | cccc |
I thought something like this:
SELECT *
FROM MyTable
WHERE Order_ID not in (SELECT * FROM MyTable WHERE CODE = 'xxxx')
BUT, the big problem here is that I'm working with a pretty big query so the subquery is also too large and the whole query takes a lot to run. Is there any workaround to avoid the subquery?
CodePudding user response:
Beyond the issue others have pointed out with your syntax, the big problem is the subquery being called for each row. I've done a test on a similar big table with a non-unique field (like your Order_ID
), and found very large savings by structuring the query with a CTE:
WITH bad_ids AS (
SELECT DISTINCT Order_ID
FROM MyTable
WHERE CODE = 'xxxx'
)
SELECT *
FROM MyTable m
LEFT JOIN bad_ids b
ON m.Order_ID = b.Order_ID
WHERE b.Order_ID IS NULL;
Here I'm assuming Order_ID
is not allowed to be null.
Given the huge difference I saw in performance, I'd be surprised if you didn't see the same, no matter what the differences are between query planners.
Edit: Essentially this works because the above query is searching through the table twice and joining once. Your query is searching some subset of the table as many times as you have rows.
CodePudding user response:
The column Order_ID
should be compared with itself, not with any column (*
):
SELECT *
FROM MyTable
WHERE Order_ID not in (SELECT Order_ID FROM MyTable WHERE CODE = 'xxxx')
CodePudding user response:
SELECT * FROM MyTable WHERE CODE <> 'xxxx'