Home > database >  Replace COUNT comparison in SQL
Replace COUNT comparison in SQL

Time:11-05

I have an exercise where I have to rewrite a query without using the NOT EXISTS.

For example, I can have this query that is for getting all the employees older than 30 that are not managers:

SELECT * 
FROM employees e 
WHERE NOT EXISTS (
        SELECT * 
        FROM managers m 
        WHERE m.name = e.name
      ) AND age >= 30

That can be rewritten in this way:

SELECT * 
FROM employees e 
WHERE (SELECT COUNT(*) FROM managers) = (SELECT COUNT(*) FROM managers WHERE m.name != e.name) AND age >= 30

Because if the employee is not a manager both sets will be equal (none element will be removed from the last subquery) and vice versa.

In a more generic way if I want to replace the NOT EXISTS from here:

SELECT * 
FROM t1 
WHERE NOT EXISTS (SELECT * FROM t2 WHERE NOT condition1) AND condition2

It is clear that I can use this:

SELECT * 
FROM t1 
WHERE (SELECT COUNT(*) FROM t2) = (SELECT COUNT(*) FROM t2 WHERE condition1) AND condition2

So my question is, keeping things generic (not using the employees and managers problem), is there a shorter way to remove the NOT EXISTS than this one?

I already know that there is nothing bad with the NOT EXISTS, but I repeat, this is an exercise to think about all SQL features.

Thanks

CodePudding user response:

Use an outer join and filter for non-joins:

SELECT * 
FROM employees e
LEFT JOIN managers m ON m.name = e.name
WHERE e.age >= 30
AND m.name IS NULL

More generically:

SELECT * 
FROM t1
LEFT JOIN t2 ON <how to join>
  AND <other conditions on t2>
WHERE <conditions on t1>
AND <t2 join column> IS NULL

If there are conditions on t2, they must go in the join condition; if you put them in the WHERE clause the outer join will become in inner and you will get no results.

CodePudding user response:

NOT EXISTS is the straight-forward approach to do this. You can use NOT IN instead, but must make sure that you don't select any nulls in the subquery:

SELECT *
FROM employees e
WHERE e.age >= 30
AND e.name NOT IN (SELECT m.name FROM managers);

You can also write this as an anti join. This is considered less readable, because this is not how you'd express the task in human language. I.e. you could say "Give me the employees that are not in the managers group" or "give me the employees for which not exists an entry in the managers group", but you would hardly say "give me the employees that I would get if I marked all employees with their manager assignment and then remove the employees that are marked manager." But this is essentially how it is written. (It is a form of writing the query that is particularily used with young DBMS where the developers put much effort into joins and neglected the EXISTS clause so far and you run into performance issues.)

SELECT e.*
FROM employees e
LEFT OUTER JOIN managers m ON m.name = e.name
WHERE e.age >= 30
AND m.name IS NULL;

CodePudding user response:

As far as i understand your question, if you want to avoid using COUNT as well, you can use LEFT JOIN and check for NULL values in the joined table.

SELECT t1.*
FROM t1
LEFT JOIN t2 ON NOT condition1
WHERE t2.id IS NULL AND condition2;

So this way you perform a LEFT JOIN on t1 and t2 using the condition1. Then, you check for NULL values in t2.id (assuming id is a non-nullable column in t2) to find the rows in t1 that do not have a corresponding match in t2, satisfying the NOT EXISTS condition. Finally, you apply condition2 to further filter the results. Hope this helps answer your question.

CodePudding user response:

SELECT
  *
FROM employees
LEFT JOIN managers ON managers.name=employees.name
WHERE managers.name IS NULL

The principle is to left join the base table with the search table, the one you would otherwise apply the NOT EXISTS predicate on, and finally filtering on the join column of the search table being NULL. I would have expected the query to be self-explanatory.

  • Related