how can I improve the performance of this slow query in mysql-CodePudding

I have a mysql query which combines data from 3 tables, which I'm calling "first_table", "second_table", and "third_table" as shown below.

This query consistently shows up in the MySQL slow query log, even though all fields referenced in the query are indexed, and the actual amount of data in these tables is not large (< 1000 records, except for "third_table" which has more like 10,000 records).

I'm trying to determine if there is a better way to structure this query to achieve better performance, and what part of this query is likely to be the most likely culprit for causing the slowdown.

Please note that "third_table.placements" is a JSON field type. All "label" fields are varchar(255), "id" fields are primary key integer fields, "sample_img" is an integer, "guid" is a string, "deleted" is an integer, and "timestamp" is a datetime.

SELECT DISTINCT first_table.id,
                first_table.label,
                (SELECT guid
                 FROM   second_table
                 WHERE  second_table.id = first_table.sample_img) AS guid,
                Count(third_table.id)                     AS
                related_count,
                Sum(Json_length(third_table.placements))  AS
                placements_count
FROM   first_table
       LEFT JOIN third_table
              ON Json_overlaps(third_table.placements,
                 Cast(first_table.id AS CHAR))
WHERE  first_table.deleted IS NULL
       AND third_table.deleted IS NULL
       AND Unix_timestamp(third_table.timestamp) >= 1647586800
       AND Unix_timestamp(third_table.timestamp) < 1648191600
GROUP  BY first_table.id
ORDER  BY Lower(first_table.label) ASC
LIMIT  0, 1000

CodePudding user response：

The biggest problem is that these are not sargable:

    WHERE ... Unix_timestamp(third_table.timestamp) < 1648191600
    ORDER BY  Lower(first_table.label)

That is, don't hide a potentially indexed column inside a function call. Instead:

    WHERE ... third_table.timestamp < FROM_UNIXTIME(1648191600)

and use a case insensitive COLLATION for first_table.label. That is any collation ending in _ci. (Please provide SHOW CREATE TABLE so I can point that out, and to check the vague "all fields are indexed" -- That usually indicates not knowing the benefits of "composite" indexes.)

Json_overlaps(...) is probably also not sargable. But it gets trickier to fix. Please explain the structure of the json and the types of id and placements.

Do you really need 1000 rows in the output? That is quite large for "pagination".

How big are the tables? UUIDs/GUIDs are notorious when the tables are too big to be cached in RAM.

It is possibly never useful to have both SELECT DISTINCT and GROUP BY. Removing the DISTINCT may speed up the query by avoiding an extra sort.

Do you really want LEFT JOIN, not just JOIN? (I don't understand the query enough to make a guess.)

After you have fixed most of those, and if you still need help, I may have a way to get rid of the GROUP BY by adding a 'derived' table. Later. (Then I may be able to address the "json_overlaps" discussion.)

Please provide EXPLAIN SELECT ...