Home > database >  How to use multiple table as a value for a single NOT IN condition?
How to use multiple table as a value for a single NOT IN condition?

Time:10-27

I have category, product, and customer table in my database. Each of this table has a column named image_id which will refer to the image table as foreign key. Basically i need to fetch image table records where the image_id not in category, product, and customer table.

Here is my current query:

SELECT
    *
FROM
    `image`
WHERE
    image.id NOT IN(
    SELECT
        product.image_id
    FROM
        product AS product
    WHERE
        product.image_id IS NOT NULL
) AND image.id NOT IN(
    SELECT
        category.image_id
    FROM
        category AS category
    WHERE
        category.image_id IS NOT NULL
) AND image.id NOT IN(
    SELECT
        customer.image_id
    FROM
        customer AS customer
    WHERE
        customer.image_id IS NOT NULL
)

The query works fine, but the NOT IN condition looks repeatable , is there a way where i can convert these multiple NOT IN conditions into one single NOT IN condition ?

CodePudding user response:

use LEFT JOIN, then add WHERE clause to filter out non-null value

SELECT i.* FROM
  `image` i
  LEFT JOIN `product`  p ON p.image_id=i.id
  LEFT JOIN `category` c ON c.image_id=i.id
  LEFT JOIN `customer` u ON u.image_id=i.id
WHERE
  p.image_id IS NULL
  AND c.image_id IS NULL
  AND u.image_id IS NULL

This works because LEFT JOIN is a JOIN that will always select the left table (in this case image table), then if the right table (in this case product, category, and customer table) does not have a joinable record, its value will be null.

By specifying WHERE p.image_id IS NULL, you ask to select images, that does not have a joinable record in product table. Repeat the same step for category and customer.

Another advantage is that this query is faster than yours since yours have 3 subqueries, you can check it with explain query statement.

Comparison of EXPLAIN query:

  • ddl
CREATE TABLE `test`.`image` ( `id` INT NOT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;
CREATE TABLE `test`.`product` ( `id` INT NOT NULL , `image_id` INT NOT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;
CREATE TABLE `test`.`category` ( `id` INT NOT NULL , `image_id` INT NOT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;
CREATE TABLE `test`.`customer` ( `id` INT NOT NULL , `image_id` INT NOT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;
ALTER TABLE `product` ADD INDEX (`image_id`);
ALTER TABLE `category` ADD INDEX (`image_id`);
ALTER TABLE `customer` ADD INDEX (`image_id`);
  • explain your NOT IN subquery
EXPLAIN SELECT
    *
FROM
    `image`
WHERE
    image.id NOT IN(
    SELECT
        product.image_id
    FROM
        product AS product
    WHERE
        product.image_id IS NOT NULL
) AND image.id NOT IN(
    SELECT
        category.image_id
    FROM
        category AS category
    WHERE
        category.image_id IS NOT NULL
) AND image.id NOT IN(
    SELECT
        customer.image_id
    FROM
        customer AS customer
    WHERE
        customer.image_id IS NOT NULL
);
  • result:
 ------ -------------------- ---------- ---------------- --------------- ---------- --------- ------ ------ -------------------------- 
| id   | select_type        | table    | type           | possible_keys | key      | key_len | ref  | rows | Extra                    |
 ------ -------------------- ---------- ---------------- --------------- ---------- --------- ------ ------ -------------------------- 
|    1 | PRIMARY            | image    | index          | NULL          | PRIMARY  | 4       | NULL | 1    | Using where; Using index |
|    4 | DEPENDENT SUBQUERY | customer | index_subquery | image_id      | image_id | 4       | func | 1    | Using index; Using where |
|    3 | DEPENDENT SUBQUERY | category | index_subquery | image_id      | image_id | 4       | func | 1    | Using index; Using where |
|    2 | DEPENDENT SUBQUERY | product  | index_subquery | image_id      | image_id | 4       | func | 1    | Using index; Using where |
 ------ -------------------- ---------- ---------------- --------------- ---------- --------- ------ ------ -------------------------- 
  • explain my LEFT JOIN WHERE IS NULL
EXPLAIN SELECT i.* FROM
  `image` i
  LEFT JOIN `product`  p ON p.image_id=i.id
  LEFT JOIN `category` c ON c.image_id=i.id
  LEFT JOIN `customer` u ON u.image_id=i.id
WHERE
  p.image_id IS NULL
  AND c.image_id IS NULL
  AND u.image_id IS NULL;
  • result:
 ------ ------------- ------- ------- --------------- ---------- --------- ----------- ------ -------------------------------------- 
| id   | select_type | table | type  | possible_keys | key      | key_len | ref       | rows | Extra                                |
 ------ ------------- ------- ------- --------------- ---------- --------- ----------- ------ -------------------------------------- 
|    1 | SIMPLE      | i     | index | NULL          | PRIMARY  | 4       | NULL      | 1    | Using index                          |
|    1 | SIMPLE      | p     | ref   | image_id      | image_id | 4       | test.i.id | 1    | Using where; Using index; Not exists |
|    1 | SIMPLE      | c     | ref   | image_id      | image_id | 4       | test.i.id | 1    | Using where; Using index; Not exists |
|    1 | SIMPLE      | u     | ref   | image_id      | image_id | 4       | test.i.id | 1    | Using where; Using index; Not exists |
 ------ ------------- ------- ------- --------------- ---------- --------- ----------- ------ -------------------------------------- 

CodePudding user response:

You can simply do UNION in your subquery like this:

SELECT *
FROM image i
  WHERE id NOT IN
    (SELECT image_id FROM product UNION
      SELECT image_id FROM category UNION
      SELECT image_id FROM customer);

*Note that UNION gives you distinct results (eliminating duplicate values) while UNION ALL will return all value including duplicates. Both can be applied here but from what I understand, UNION ALL might be faster because it didn't have the extra process of removing duplicates. However, if you're working with small set of data and you've got proper indexing, it should not have a huge impact on performance.

Same UNION subquery also can emulate LEFT JOIN proposed by @Kristian but with a different query structure; whereby you'll only need one WHERE condition:

SELECT i.*
FROM image i
LEFT JOIN
 (SELECT image_id FROM product UNION
    SELECT image_id FROM category UNION
     SELECT image_id FROM customer) B ON i.id=B.image_id
WHERE B.image_id IS NULL;

Demo fiddle

BTW, assigning table with an alias similar to it's name is just redundancy and you also don't need to append the table name in the subquery if it's only a single table, well in most case, unless you're using a function like NOT EXISTS and the tables have similar column name. Look at this example from your query:

...
WHERE
    image.id NOT IN(
    SELECT
        product.image_id
    FROM
        product AS product  <---- this is not necessary
    WHERE
        product.image_id IS NOT NULL)
        ^^^^^^^
        this also no need
...

Instead you can just write them like this:

WHERE
    image.id NOT IN(
    SELECT
        image_id
    FROM
        product
    WHERE
        image_id IS NOT NULL)
  • Related