I have a legacy query that is terribly slow. I'll show the query, and the background to it after. The query takes ~ 10s which is ridiculously slow. Explain gives me:
Query:
select id as Id,
eid as AccountId,
Surname
from staff
LEFT JOIN app_roles ON (app_roles.app_staff_id = staff.id )
where staff.eid = 7227
AND app_roles.application_id = '1'
and last_modified > '2022-05-11 13:15:21Z'
Staff table contains 280k rows, app_roles contains 644k rows. Staff rows with eid 7727 - 87 rows. app_roles rows for those matching staff id's - 75 rows
Table structures:
CREATE TABLE `app_roles` (
`application_id` varchar(40) NOT NULL,
`app_staff_id` varchar(40) NOT NULL,
`role` varchar(40) NOT NULL,
PRIMARY KEY (`application_id`,`app_staff_id`),
KEY `application_id` (`application_id`),
KEY `app_staff_id` (`app_staff_id`)
) ENGINE=InnoDB
CREATE TABLE `staff` (
`eid` int NOT NULL,
`id` varchar(40) NOT NULL,
`forename` varchar(60) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`surname` varchar(150) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
... columns omitted for simplicity
PRIMARY KEY (`eid`,`id`),
KEY `email` (`email`),
KEY `app_login` (`app_login`),
KEY `app_passwd` (`app_password`),
KEY `id` (`id`),
KEY `eid` (`eid`)
) ENGINE=InnoDB
---- ------------- ----------- ------------ -------- ------------------------------------- ---------------- --------- --------------------------------------- -------- ---------- --------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ----------- ------------ -------- ------------------------------------- ---------------- --------- --------------------------------------- -------- ---------- --------------------------
| 1 | SIMPLE | app_roles | NULL | ref | PRIMARY,application_id,app_staff_id | application_id | 42 | const | 330114 | 100.00 | Using where; Using index |
| 1 | SIMPLE | staff | NULL | eq_ref | PRIMARY,id,eid | PRIMARY | 126 | const,inventry.app_roles.app_staff_id | 1 | 33.33 | Using where |
---- ------------- ----------- ------------ -------- ------------------------------------- ---------------- --------- --------------------------------------- -------- ---------- --------------------------
I don't understand why the left join and the where are not filtering rows out, and why the indexes are not helping.
CodePudding user response:
All other things being equal, MySQL likes to do joins by primary key lookup. It has a strong preference for that, because primary key lookups are a bit more efficient than secondary key lookups.
It may even change the order of the join to satisfy this preference. Inner join is commutative, so the optimizer can access either table first and then join to the other.
But you used a LEFT [OUTER] JOIN
, so how can this be optimized like an inner join? You wrote a condition app_roles.application_id = '1'
in the WHERE clause. If you test for a non-NULL value on the right table of a left outer join, it eliminates any of the rows that would make that join an outer join. It's effectively an inner join. Therefore the optimizer is free to reorder the tables in the join.
Both orders of join result in the join using primary key lookups. In both cases, the first column of the lookup is based on a constant condition in your query. The second column of the lookup is a reference from the first table.
So the optimizer has a dilemma. It can choose either join order, and both satisfy the preference for a primary key lookup. So it chooses one arbitrarily.
The failure is that it apparently didn't take into account that the condition on application_id
causes it to examine over 330k rows. Either the optimizer has a blindness to this cost, or else the table statistics are not up to date and are fooling the optimizer.
You can refresh the table statistics. This is easy to do and has very small impact on the running system, so you might as well do it to rule out the possibility that bad statistics are causing a bad query optimization.
ANALYZE TABLE app_roles;
ANALYZE TABLE staff;
Then try your query again.
If it's still choosing a bad optimization strategy, you can use a join hint to force it to use the join order matching what you wrote in your query.
select id as Id,
eid as AccountId,
Surname
from staff
STRAIGHT_JOIN app_roles ON (app_roles.app_staff_id = staff.id )
where staff.eid = 7227
AND app_roles.application_id = '1'
and last_modified > '2022-05-11 13:15:21Z'
There might also be a way to incorporate last_modified
into an index, but I can't tell which table it belongs to.
CodePudding user response:
I would assume you have an issue with the character set / collation. Make sure the fields you are joining match. To verify this, run :
SHOW FULL COLUMNS FROM staff;
SHOW FULL COLUMNS FROM app_roles;
More specifically, make sure app_roles.app_staff_id and staff.id are the same type.