I am trying to find the total number of outlets(nation_id wise) that exist in the orders table in a given date range.
select o2.nation_id,count(o2.id) as outlet_count
from outlets o2
where o2.id in (
SELECT distinct o.outlet_id
from orders o
where order_date >= '2022-05-01 00:00:00'
and order_date <= '2022-06-30 23:59:59'
)
group by o2.nation_id ```
Now, this query gives the exact result but it takes around 3 seconds. Is there any way to perform this query faster? Probably less than 1 second. N.B.: Outlets table contains around 25k data and orders table contains around 12 lac (1.2M) rows.
CodePudding user response:
Avoiding IN
will boost the performance and creating indices for nation_id & id columns in outlets table and outlet_id column of orders table will definitely improve the speed.
select o2.nation_id, count(Ord.outlet_id) as outlet_count
from outlets o2
LEFT JOIN
(
SELECT distinct o.outlet_id
from orders o
where order_date >= '2022-05-01 00:00:00'
and order_date <= '2022-06-30 23:59:59'
) Ord ON o2.id = Ord.outlet_id
group by o2.nation_id
CodePudding user response:
Give this a try. It's something tricky that I just invented and tested (on different tables):
SELECT ou.nation_id, COUNT(*) AS outlet_count
FROM ( SELECT o.outlet_id, MIN(order_date)
FROM orders AS o
WHERE o.order_date >= '2022-05-01'
AND o.order_date < '2022-05-01' INTERVAL 1 MONTH
GROUP BY o.outlet_id
) AS olist
JOIN outlets AS ou ON ou.id = olist.outlet_id
GROUP BY ou.nation_id
The GROUP BY
together with the MIN
(or MAX
, etc) is a trick to get the Optimizer to hop through the index, touching only one row per outlet_id
. Leaving out the MIN
or changing to DISTINCT
failed to involve the optimization. (Caveat: Different versions of MySQL/MariaDB may work differently here.)
orders: INDEX(outlet_id, order_date) -- required for the trick
If there are any issues, please provide SHOW CREATE TABLE
and EXPLAIN SELECT ...