I have a mysql query which combines data from 3 tables, which I'm calling "first_table", "second_table", and "third_table" as shown below.
This query consistently shows up in the MySQL slow query log, even though all fields referenced in the query are indexed, and the actual amount of data in these tables is not large (< 1000 records, except for "third_table" which has more like 10,000 records).
I'm trying to determine if there is a better way to structure this query to achieve better performance, and what part of this query is likely to be the most likely culprit for causing the slowdown.
Please note that "third_table.placements" is a JSON field type. All "label" fields are varchar(255), "id" fields are primary key integer fields, "sample_img" is an integer, "guid" is a string, "deleted" is an integer, and "timestamp" is a datetime.
SELECT DISTINCT first_table.id,
first_table.label,
(SELECT guid
FROM second_table
WHERE second_table.id = first_table.sample_img) AS guid,
Count(third_table.id) AS
related_count,
Sum(Json_length(third_table.placements)) AS
placements_count
FROM first_table
LEFT JOIN third_table
ON Json_overlaps(third_table.placements,
Cast(first_table.id AS CHAR))
WHERE first_table.deleted IS NULL
AND third_table.deleted IS NULL
AND Unix_timestamp(third_table.timestamp) >= 1647586800
AND Unix_timestamp(third_table.timestamp) < 1648191600
GROUP BY first_table.id
ORDER BY Lower(first_table.label) ASC
LIMIT 0, 1000
CodePudding user response:
The biggest problem is that these are not sargable:
WHERE ... Unix_timestamp(third_table.timestamp) < 1648191600
ORDER BY Lower(first_table.label)
That is, don't hide a potentially indexed column inside a function call. Instead:
WHERE ... third_table.timestamp < FROM_UNIXTIME(1648191600)
and use a case insensitive COLLATION
for first_table.label
. That is any collation ending in _ci
. (Please provide SHOW CREATE TABLE
so I can point that out, and to check the vague "all fields are indexed" -- That usually indicates not knowing the benefits of "composite" indexes.)
Json_overlaps(...)
is probably also not sargable. But it gets trickier to fix. Please explain the structure of the json and the types of id
and placements
.
Do you really need 1000 rows in the output? That is quite large for "pagination".
How big are the tables? UUIDs/GUIDs are notorious when the tables are too big to be cached in RAM.
It is possibly never useful to have both SELECT DISTINCT
and GROUP BY
. Removing the DISTINCT
may speed up the query by avoiding an extra sort.
Do you really want LEFT JOIN
, not just JOIN
? (I don't understand the query enough to make a guess.)
After you have fixed most of those, and if you still need help, I may have a way to get rid of the GROUP BY
by adding a 'derived' table. Later. (Then I may be able to address the "json_overlaps" discussion.)
Please provide EXPLAIN SELECT ...