Is CTE better for optimization than sub-queries in sql/mysql?-CodePudding

I'm giving one example

-- sub-query

SELECT  p.first_name, p.last_name,
        d.department_count, s.total_sales
    FROM  persons as p
    INNER JOIN  
    (
        SELECT  department_id,
                COUNT(people) as department_count
            FROM  department as d
            WHERE  department_type = 'sales'
            GROUP BY  department_id 
    ) as d  ON d.department_id = p.department_id
    LEFT OUTER JOIN  
    (
        SELECT  person_id,
                SUM(sales) as total_sales
            FROM  orders
            WHERE  orders.department_id = d.department_id
            GROUP BY  person_id 
    ) as s  ON s.person_id = p.person_id

-- cte

WITH deps as 
    (
        SELECT  department_id,
                COUNT(people) as department_count
            FROM  department as d
            WHERE  department_type = 'sales'
            GROUP BY  department_id 
    ), sales as 
    (
        SELECT  person_id,
                SUM(sales) as total_sales
            FROM  orders
            WHERE  orders.department_id = d.department_id
            GROUP BY  person_id 
    ) 
SELECT  p.first_name, p.last_name,
        d.department_count, s.total_sales
    FROM  persons as p
    INNER JOIN  deps as d
            ON d.department_id = p.department_id
    LEFT OUTER JOIN  sales as s
            ON s.person_id = p.person_id

but I'm also wanting the answer in overall case. In some cases it may depend on the dataset and objective? But usually, which one is better for optimization/performance when running the query? Moreover, if there's few less lines in any of these procedure compared to the other, will that make the execution faster?

CodePudding user response：

Both examples you show will be executed by MySQL using temporary tables. That is, the result of both the subquery or the CTE will be stored in a temporary table that lives for the duration of the query, then automatically dropped when the query ends.

Temporary tables are used for other types of queries in MySQL. You can read more about them here: https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html

Temporary tables are often associated with performance overhead. It takes time for the temporary table to be created and filled with rows from the result of the subquery or CTE. This is unavoidable.

If you can run a different query to get the result you want without creating a temporary table, that's almost always better for performance. But in the examples you show, I don't think it's possible to do in a single query.

Almost every general rule about performance has exceptions, so you really need to be careful to evaluate performance on a case by case basis. Performance optimization is a complex subject.

CodePudding user response：

These indexes may help:

orders:  INDEX(department_id, person_id)
p:  INDEX(department_id,  first_name, last_name, person_id)
s:  INDEX(person_id,  total_sales)
d:  INDEX(department_type, department_id)

Typically COUNT(*) is better than COUNT(col)