I would like to be able to select all entries from the orders table where a certain product has been ordered prior to 2019 but not after it. The table has close to 7M entries and the below query seems to take almost ~4 minutes to run. Note that in the orders table productId is a foreign key to products table and is indexed. Could we rewrite the below query to be more optimized and better in performance time ? Any help is greatly appreciated. Thank you
SELECT distinct *
FROM orders o
WHERE o.year < '2019'
AND o.productid NOT IN (
SELECT distinct(productid)
FROM orders
WHERE year > '2019');
CodePudding user response:
You could use not exists.
Hopefully the year
column is not a varchar so you should not be using string literals. Presumably using select *
means there won't be any duplicates so you should remove distinct.
Your year ranges also exclude 2019 completely, so presumably one of your predicates should be equal to 2019?
select *
from orders o
where o.year < 2019
and not exists (
select *
from orders o2
where o2.productid = o.productid
and Year >= 2019
);
CodePudding user response:
Probably both uses of DISTINCT
were useless.
Add this composite index (to at least help the NOT EXISTS):
INDEX(product_id, year)