I am working with a grails application and I added an index to speed up specific queries. I realized that their processing time hadn't gone down at all so I rewrote the queries in SQL and ran them with explain - it turned out that the indexes weren't being used even though locally they were when I tested them. The difference between my local and the db that the app is using is that locally I have version 8.0 and the other db has 5.7.
The query is pretty long so here is the gist of it:
SELECT
*
FROM (
SELECT ... FROM T1
UNION SELECT ... FROM T2) T2
INNER JOIN T3 ON T2.t3_id = T3.id
INNER JOIN T4 ON T3.t4_id = T4.id
INNER JOIN T5 ON T2.t5_id = T5.id
INNER JOIN T6 on T3.t6_id = T6.id
INNER JOIN T7 on T6.t7_id = T7.id
where T2.type <> "Used" and T2.status <> "Closed"
The index I created is:
CREATE INDEX type_status
ON T1 (type,status);
CREATE INDEX type_status1
ON T2 (type,status);
Is there another index I could use that would make performance better that SQL will use? I am using mySQL.
Here is the explain: explain results
CodePudding user response:
What you're seeing is a new optimization in MySQL 8.0.22 (or 8.0.29 in your case, since you're using a UNION): derived condition pushdown. So a condition in the outer query is "pushed down" into a derived table subquery, and if possible the condition is applied so the subquery uses an index.
This code doesn't exist in MySQL 5.7, so there's no way to optimize this. In MySQL 5.7 and earlier, the derived table subquery is "materialized" as a temporary table, then the conditions of the outer query are applied to that temp table. This often results in a lot of overhead, which is why the new optimization feature was developed.
This speaks to the importance of using the same version of MySQL in development as you will deploy to in production, so you don't get surprised by differences in the features between versions. It's common for new versions of software to have new features, and those features are not added retroactively to older versions. Not only MySQL, but practically any software.
The only suggestion I have for optimizing your query is to avoid using the derived table. Do a UNION at the outer query level. This means some duplication of the other joins, but it will be optimized better.
SELECT
*
FROM T1
INNER JOIN T3 ON T2.t3_id = T3.id
INNER JOIN T4 ON T3.t4_id = T4.id
INNER JOIN T5 ON T2.t5_id = T5.id
INNER JOIN T6 on T3.t6_id = T6.id
INNER JOIN T7 on T6.t7_id = T7.id
where T1.type <> "Used" and T1.status <> "Closed"
UNION
SELECT
*
FROM T2
INNER JOIN T3 ON T2.t3_id = T3.id
INNER JOIN T4 ON T3.t4_id = T4.id
INNER JOIN T5 ON T2.t5_id = T5.id
INNER JOIN T6 on T3.t6_id = T6.id
INNER JOIN T7 on T6.t7_id = T7.id
where T2.type <> "Used" and T2.status <> "Closed"