Today, we had major performance issue caused by this exact query (with different names of course):
UPDATE foo
LEFT JOIN bar
ON bar.foo_id = foo.id
SET foo.flag = 1
WHERE bar.foo_id IS NULL;
The software is third-party and we didn't write the code or database schema.
foo
has 42k rows, bar
has 130k. A particular foo.id
occurred very often in bar.foo_id
. (bar
is a relationship table implementing an m:n relationship between foo
and qux
. Almost each row in qux
will have a relationship with a specific foo
row and then to several more that are unique to it. There may be more duplicates, likely not important.) An index on foo.id
exists. bar.foo_id
is part of an index also containing bar.qux_id
and unrelated bar.text
.
The query ran for 55min at 100% CPU of a db.t3.medium
instance using Aurora MySQL 5.7. The slowlog stated Rows_examined: 5719954827
.
Can someone explain what exactly the reason for this number and the resulting low performance is? 42k * 130k is close to it so that probably has something to do with it.
In our case, we could simply GROUP BY foo_id
and then join with the result instead of bar
which solved the problem. However, that won't be possible in all circumstances - and I don't think creating an index on bar.foo_id
should have an effect as NULL
values can only occur due to the JOIN
statement.
The EXPLAIN
output is as follows:
id|select_type|table |partitions|type |possible_keys|key |key_len|ref|rows |filtered|Extra |
-- ----------- ----------------- ---------- ----- ------------- --------------------- ------- --- ------ -------- ------------------------
1|UPDATE |foo | |ALL | | | | | 42921| 100.0| |
1|SIMPLE |bar | |index| |uidx_qux_foo |780 | |129465| 10.0|Using where; Using index|
To my knowledge, EXPLAIN PLAN
is not available in MySQL.
CodePudding user response:
You're only writing fields in one table, the aptly-named foo
, so write your query to only seek/scan once into that one table:
update foo set flag = 1 where ...
Instead, what you wrote does a product of your two (three?) tables and then sets the value for every single one of the resulting rows. That's needlessly wasteful, since, again, you only want to set data in your foo
table at most once per row.
And once you fix your overall logic, what you're left with is the most typical database access pattern: providing correct indexes for your select
statement (ie, the part after where
above). A simple explain plan
should show you exactly what the problem is, if any. Your post is of no help at all here.
CodePudding user response:
Rows_examined: 5719954827
is a strong indicator.
Change the query to this "semi-join". It will not need to repeatedly set foo.flag, but will do int only once per row in foo.
UPDATE foo
SET foo.flag = 1
WHERE NOT EXISTS ( SELECT 1 FROM bar
WHERE bar.foo_id = foo.id );
That will probably give you something like "84K rows examined" (84K = 2*42K).
And bar
does need an INDEX
starting with foo_id
.
Why do you need this flag? Why do you need to change lots of rows at once? Can't you design the app some other way?
For any further discuss, please provide SHOW CREATE TABLE
.