I'm giving one example
-- sub-query
SELECT p.first_name, p.last_name,
d.department_count, s.total_sales
FROM persons as p
INNER JOIN
(
SELECT department_id,
COUNT(people) as department_count
FROM department as d
WHERE department_type = 'sales'
GROUP BY department_id
) as d ON d.department_id = p.department_id
LEFT OUTER JOIN
(
SELECT person_id,
SUM(sales) as total_sales
FROM orders
WHERE orders.department_id = d.department_id
GROUP BY person_id
) as s ON s.person_id = p.person_id
-- cte
WITH deps as
(
SELECT department_id,
COUNT(people) as department_count
FROM department as d
WHERE department_type = 'sales'
GROUP BY department_id
), sales as
(
SELECT person_id,
SUM(sales) as total_sales
FROM orders
WHERE orders.department_id = d.department_id
GROUP BY person_id
)
SELECT p.first_name, p.last_name,
d.department_count, s.total_sales
FROM persons as p
INNER JOIN deps as d
ON d.department_id = p.department_id
LEFT OUTER JOIN sales as s
ON s.person_id = p.person_id
but I'm also wanting the answer in overall case. In some cases it may depend on the dataset and objective? But usually, which one is better for optimization/performance when running the query? Moreover, if there's few less lines in any of these procedure compared to the other, will that make the execution faster?
CodePudding user response:
Both examples you show will be executed by MySQL using temporary tables. That is, the result of both the subquery or the CTE will be stored in a temporary table that lives for the duration of the query, then automatically dropped when the query ends.
Temporary tables are used for other types of queries in MySQL. You can read more about them here: https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html
Temporary tables are often associated with performance overhead. It takes time for the temporary table to be created and filled with rows from the result of the subquery or CTE. This is unavoidable.
If you can run a different query to get the result you want without creating a temporary table, that's almost always better for performance. But in the examples you show, I don't think it's possible to do in a single query.
Almost every general rule about performance has exceptions, so you really need to be careful to evaluate performance on a case by case basis. Performance optimization is a complex subject.
CodePudding user response:
These indexes may help:
orders: INDEX(department_id, person_id)
p: INDEX(department_id, first_name, last_name, person_id)
s: INDEX(person_id, total_sales)
d: INDEX(department_type, department_id)
Typically COUNT(*)
is better than COUNT(col)