Home > Mobile >  Optimizing sql queries with NOT IN clause on the same table
Optimizing sql queries with NOT IN clause on the same table

Time:06-05

I would like to be able to select all entries from the orders table where a certain product has been ordered prior to 2019 but not after it. The table has close to 7M entries and the below query seems to take almost ~4 minutes to run. Note that in the orders table productId is a foreign key to products table and is indexed. Could we rewrite the below query to be more optimized and better in performance time ? Any help is greatly appreciated. Thank you

SELECT distinct *
FROM orders o
WHERE o.year < '2019'
AND o.productid NOT IN (
                        SELECT distinct(productid)
                        FROM orders
                        WHERE year > '2019');

CodePudding user response:

You could use not exists.

Hopefully the year column is not a varchar so you should not be using string literals. Presumably using select * means there won't be any duplicates so you should remove distinct.

Your year ranges also exclude 2019 completely, so presumably one of your predicates should be equal to 2019?

select *
from orders o
where o.year < 2019
  and not exists (
    select *
    from orders o2
    where o2.productid = o.productid
      and Year >= 2019
  );

CodePudding user response:

Probably both uses of DISTINCT were useless.

Add this composite index (to at least help the NOT EXISTS):

INDEX(product_id, year)
  • Related