Lets say we have two tables
Customers table:
id | name |
---|---|
1 | Joe |
2 | Henry |
3 | Sam |
4 | Max |
Orders table:
id | customerID |
---|---|
1 | 3 |
2 | 1 |
I want to find the id of customers who never order. Basically, show me an output of all the Customers.id that do not appear in the Orders table which should be:
Expected output table: | id | |:---| | 2| | 4|
The SQL code to do this is:
SELECT Customers.id
FROM Customers
WHERE NOT EXISTS (
SELECT Orders.customerId
FROM Orders
WHERE Customers.id = Orders.customerId
)
My question is: Why do we need to write the second WHERE clause? Wouldn't the subquery return the same result even without the WHERE clause?
How is:
WHERE NOT EXISTS (
SELECT Orders.customerId
FROM Orders
WHERE Customers.id = Orders.customerId
)
different to:
WHERE NOT EXISTS (
SELECT Orders.customerId
FROM Orders
)
However, the query returns zero rows without the 2nd WHERE clause and I don't understand why. Could anyone explain?
CodePudding user response:
Because you need to "join" the subquery to the main query
Imagine the inner query executed for each row of the main query.
If you don't use the inner WHERE
clause the inner query (in your case) will always return rows, so the NOT EXISTS
will always return FALSE
for each customer.
Viceversa, if the Orders table is empty the inner query will not return rows so the NOT EXISTS
will be TRUE
for all the Customers.
CodePudding user response:
The expression in the WHERE have to boolean, which means that they have to return TRUE or FALSE. If the subquery returns a number of lines is that true or false?
Your query could also be writen
SELECT Customers.id
FROM Customers
LEFT JOIN Orders
ON Customers.id = Orders.customerId
WHERE Orders.customerId IS NULL;
because the LEFT JOIN says take all records from the first table (Customers) whether we find a matching record in the second table. If there is no matching record all columns requested from the second table will be null.