Home > Blockchain >  Query with MySQL join is slow, is there a way to improve slow performance with subquery?
Query with MySQL join is slow, is there a way to improve slow performance with subquery?

Time:11-24

I am using InnoDB's full text search to retrieve results from 2 tables, photos and albums as follows:

SELECT p.media_extension, 
  p.media_vishash, 
  ANY_VALUE(a.title_url) as album_title_url, 
  ANY_VALUE(p.title_url) as photo_title_url, 
  p.media_time_created, 
  p.comments, 
  p.title, 
  p.text, 
  MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename) 
  AGAINST ('search query' IN BOOLEAN MODE) as score 
FROM albums a, 
  album_photo_map apm, 
  media p 
WHERE p.media_vishash = apm.media_vishash AND 
    apm.album_id = a.album_id AND 
    p.level <= 5 AND 
    a.level <= 5 AND 
    MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename) 
    AGAINST ('search query' IN BOOLEAN MODE) 
GROUP BY p.media_vishash 
ORDER BY p.datetime_created DESC, p.media_vishash DESC

executes in 0.80s.

 ---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ---------------------------------------------- 
| id | select_type | table | partitions | type     | possible_keys                           | key      | key_len | ref                | rows | filtered | Extra                                        |
 ---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ---------------------------------------------- 
|  1 | SIMPLE      | a     | NULL       | ALL      | PRIMARY,album_id_type,album_level,level | NULL     | NULL    | NULL               | 9147 |    49.99 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | NULL       | fulltext | media_vishash,search                    | search   | 0       | const              |    1 |    33.33 | Using where                                  |
|  1 | SIMPLE      | apm   | NULL       | ref      | PRIMARY,album_id,media_vishash          | album_id | 130     | lifebox.a.album_id |   26 |    10.00 | Using where; Using index                     |
 ---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ---------------------------------------------- 

The joining with the albums table really slows down the query. I tried re-writing it without the join as follows, and performance is excellent.

SELECT p.media_extension, 
    p.media_vishash, 
    ANY_VALUE(p.title_url) as photo_title_url, 
    p.media_time_created, 
    p.comments, 
    p.title, 
    p.text,
   MATCH   (p.title,p.text,p.tag_list,p.comment_list,p.media_filename) AGAINST ('search query' IN BOOLEAN MODE) as score 
FROM media p
WHERE MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename) 
    AGAINST ('search query' IN BOOLEAN MODE) 

executed in 0.01s.

The issue is that I need to retrieve title_url from the albums table, and this is missing in the second query.

Is there a way I can retrieve albums.title_url as in the first query while maintaining the excellent performance of the second, without splitting this into multiple queries? Maybe with a subselect to grab title_url from the albums table?

CREATE tables below, MySQL 8.0.27:

 CREATE TABLE `media` (
  `media_id` char(22) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `media_sha512` char(128) NOT NULL,
  `media_sha256` char(64) NOT NULL,
  `media_md5` char(32) NOT NULL,
  `media_filepath` varchar(250) NOT NULL,
  `media_vishash` char(128) NOT NULL,
  `media_filename` varchar(250) NOT NULL,
  `media_extension` varchar(10) NOT NULL,
  `media_path` varchar(250) NOT NULL,
  `media_folder` varchar(250) NOT NULL,
  `media_time_lastmod` bigint NOT NULL,
  `media_time_created` bigint DEFAULT NULL,
  `media_filesize` bigint DEFAULT NULL,
  `media_width` int DEFAULT NULL,
  `media_height` int DEFAULT NULL,
  `media_megapixels` float DEFAULT NULL,
  `media_mimetype` varchar(50) DEFAULT NULL,
  `media_type` varchar(25) DEFAULT NULL,
  `title` varchar(250) NOT NULL,
  `title_url` varchar(250) NOT NULL,
  `title_url_previous` varchar(250) DEFAULT NULL,
  `text` text,
  `year` int DEFAULT NULL,
  `day` int DEFAULT NULL,
  `month` int DEFAULT NULL,
  `sidecarimg_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `sidecarimg_md5` char(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `sidecarvideo_filepath` varchar(250) DEFAULT NULL,
  `sidecarvideo_md5` char(32) DEFAULT NULL,
  `sidecarxmp_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `sidecarxmp_md5` char(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `modified_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `modified_md5` char(32) DEFAULT NULL,
  `modified_time_lastmod` int DEFAULT NULL,
  `modified_time_created` int DEFAULT NULL,
  `datetime_created` datetime DEFAULT NULL,
  `datetime_modified` datetime DEFAULT NULL,
  `EXIF_Make` varchar(50) DEFAULT NULL,
  `EXIF_LensModel` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `EXIF_LensID` varchar(100) DEFAULT NULL,
  `EXIF_LensMount` varchar(100) DEFAULT NULL,
  `EXIF_LensFormat` varchar(100) DEFAULT NULL,
  `EXIF_Software` varchar(100) DEFAULT NULL,
  `EXIF_ByLine` varchar(100) DEFAULT NULL,
  `EXIF_Copyright` varchar(100) DEFAULT NULL,
  `EXIF_DateTimeOriginal` datetime DEFAULT NULL,
  `EXIF_ExposureTime` varchar(100) DEFAULT NULL,
  `EXIF_ShutterSpeed` varchar(100) DEFAULT NULL,
  `EXIF_FNumber` varchar(100) DEFAULT NULL,
  `EXIF_FocalLength` varchar(100) DEFAULT NULL,
  `EXIF_FocalLength35mm` varchar(100) DEFAULT NULL,
  `EXIF_ISO` varchar(100) DEFAULT NULL,
  `EXIF_DisplayProfile` varchar(100) DEFAULT NULL,
  `EXIF_Keywords` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `EXIF_Model` varchar(75) DEFAULT NULL,
  `EXIF_Orientation` varchar(100) DEFAULT NULL,
  `EXIF_ExposureProgram` varchar(100) DEFAULT NULL,
  `EXIF_BitsPerSample` varchar(100) DEFAULT NULL,
  `EXIF_FlashFire` varchar(100) DEFAULT NULL,
  `EXIF_ExposureCompensation` varchar(100) DEFAULT NULL,
  `EXIF_HDR` varchar(100) DEFAULT NULL,
  `EXIF_ColorTemperature` varchar(25) DEFAULT NULL,
  `EXIF_ColorSpace` varchar(20) DEFAULT NULL,
  `EXIF_ColorProfileDescription` varchar(50) DEFAULT NULL,
  `EXIF_CustomRendered` varchar(100) DEFAULT NULL,
  `EXIF_HistorySoftwareAgent` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `EXIF_WhiteBalance` varchar(100) DEFAULT NULL,
  `EXIF_Quality` varchar(20) DEFAULT NULL,
  `EXIF_SelfTimer` varchar(100) DEFAULT NULL,
  `EXIF_Contrast` varchar(100) DEFAULT NULL,
  `EXIF_Saturation` varchar(100) DEFAULT NULL,
  `EXIF_Sharpness` varchar(100) DEFAULT NULL,
  `EXIF_SerialNumber` varchar(100) DEFAULT NULL,
  `EXIF_ShutterCount` varchar(100) DEFAULT NULL,
  `EXIF_Rating` varchar(100) DEFAULT NULL,
  `EXIF_RatingPercent` varchar(100) DEFAULT NULL,
  `EXIF_Good` tinyint DEFAULT NULL,
  `EXIF_Subject` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `EXIF_Title` varchar(250) DEFAULT NULL,
  `EXIF_Description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
  `EXIF_Location` varchar(250) DEFAULT NULL,
  `GPS_Altitude_Ref` varchar(50) DEFAULT NULL,
  `GPS_Altitude` varchar(50) DEFAULT NULL,
  `GPS_Latitude` varchar(50) DEFAULT NULL,
  `GPS_Longitude` varchar(50) DEFAULT NULL,
  `GPS_Latitude_Dec` double DEFAULT NULL,
  `GPS_Longitude_Dec` double DEFAULT NULL,
  `GPS_datetime` datetime DEFAULT NULL,
  `is_video` tinyint(1) NOT NULL DEFAULT '0',
  `video_duration` varchar(50) DEFAULT NULL,
  `video_framerate` varchar(50) DEFAULT NULL,
  `video_bitrate` varchar(50) DEFAULT NULL,
  `video_resolution` varchar(5) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `video_codec` varchar(50) DEFAULT NULL,
  `video_pixfmt` varchar(50) DEFAULT NULL,
  `video_profile` varchar(50) DEFAULT NULL,
  `video_colorprimary` varchar(50) DEFAULT NULL,
  `video_rotation` varchar(5) DEFAULT NULL,
  `audio_codec` varchar(50) DEFAULT NULL,
  `audio_bitrate` varchar(50) DEFAULT NULL,
  `audio_channels` varchar(50) DEFAULT NULL,
  `audio_bits_per_sample` varchar(50) DEFAULT NULL,
  `audio_sample_rate` varchar(50) DEFAULT NULL,
  `comment_list` text,
  `tag_list` text,
  `timestamp_last_comment` int NOT NULL DEFAULT '0',
  `timestamp_last_view` int DEFAULT NULL,
  `views` int NOT NULL DEFAULT '0',
  `comments` int NOT NULL DEFAULT '0',
  `tags` int NOT NULL DEFAULT '0',
  `thumbs` json DEFAULT NULL,
  `thumbs_datetime_created` datetime DEFAULT NULL,
  `level` tinyint NOT NULL DEFAULT '2',
  PRIMARY KEY (`media_id`),
  UNIQUE KEY `media_vishash` (`media_vishash`) USING BTREE,
  UNIQUE KEY `media_sha512` (`media_sha512`) USING BTREE,
  KEY `index_rating` (`EXIF_Rating`),
  FULLTEXT KEY `search` (`title`,`text`,`tag_list`,`comment_list`,`media_filename`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `album_photo_map` (
  `album_id` varchar(32) NOT NULL DEFAULT '',
  `media_vishash` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`album_id`,`media_vishash`) USING BTREE,
  KEY `album_id` (`album_id`),
  KEY `media_vishash` (`media_vishash`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `albums` (
  `album_id` varchar(32) NOT NULL DEFAULT '',
  `title` varchar(100) NOT NULL DEFAULT '',
  `title_url` varchar(100) NOT NULL DEFAULT '',
  `location` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT '',
  `text` text,
  `timestamp_created` int NOT NULL DEFAULT '0',
  `timestamp_modified` int DEFAULT '0',
  `timestamp_oldest` int DEFAULT NULL,
  `timestamp_newest` int DEFAULT NULL,
  `num_photos` int NOT NULL DEFAULT '0',
  `num_videos` int NOT NULL DEFAULT '0',
  `num_items` int NOT NULL DEFAULT '0',
  `type` tinyint(1) NOT NULL DEFAULT '1',
  `highlight_media_vishash` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `timestamp_last_view` int NOT NULL DEFAULT '0',
  `last_viewer` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT '',
  `views` int NOT NULL DEFAULT '0',
  `min_level` tinyint(1) NOT NULL DEFAULT '2',
  `level` tinyint(1) NOT NULL DEFAULT '2',
  PRIMARY KEY (`album_id`),
  UNIQUE KEY `title` (`title`),
  UNIQUE KEY `title_url` (`title_url`),
  UNIQUE KEY `album_id_type` (`album_id`,`type`),
  KEY `highlight_photo_id` (`highlight_media_vishash`),
  KEY `lowest_level` (`min_level`),
  KEY `timestamp_created` (`timestamp_created`),
  KEY `type` (`type`),
  KEY `level` (`level`),
  FULLTEXT KEY `title_2` (`title`,`text`,`location`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CodePudding user response:

I tested your first query with EXPLAIN and found that the optimizer wants to reorder the tables so it starts with album_photo_map and then joins to the primary key of each of the other tables. This means it's not using your fulltext index at all.

So I tested the query using STRAIGHT_JOIN syntax to override the optimizer's tendency to reorder tables. This also led me to use modern JOIN ... ON syntax, which you should do anyway.

EXPLAIN SELECT p.media_extension,
  p.media_vishash, 
  ANY_VALUE(a.title_url) as album_title_url, 
  ANY_VALUE(p.title_url) as photo_title_url,
  p.media_time_created,
  p.comments,
  p.title,
  p.text,
  MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
  AGAINST ('search query' IN BOOLEAN MODE) as score
FROM media p
STRAIGHT_JOIN album_photo_map apm ON p.media_vishash = apm.media_vishash
STRAIGHT_JOIN albums a ON apm.album_id = a.album_id
WHERE
    p.level <= 5 AND
    a.level <= 5 AND
    MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
    AGAINST ('search query' IN BOOLEAN MODE)
GROUP BY p.media_vishash
ORDER BY p.datetime_created DESC, p.media_vishash DESC

This results in a more favorable optimization. It uses the fulltext index, and then joins to the other two tables by using indexes.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: fulltext
possible_keys: media_vishash,search
          key: search
      key_len: 0
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: apm
   partitions: NULL
         type: index
possible_keys: PRIMARY,album_id,media_vishash
          key: album_id
      key_len: 130
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,album_id_type,level
          key: PRIMARY
      key_len: 130
          ref: test.apm.album_id
         rows: 1
     filtered: 100.00
        Extra: Using where

We can improve it a bit more. Since the media table is read first, it needs to do lookups in album_photo_map by media_vishash. But that's not the leftmost column of the latter table's primary key. If you reorder the primary key, it will be able to do a more efficient join.

alter table album_photo_map drop primary key, 
  add primary key(media_vishash,album_id);

CodePudding user response:

PRIMARY KEY (`album_id`),
UNIQUE KEY `album_id_type` (`album_id`,`type`),

In MySQL, a PK is necessarily unique and clustered with the data. Hence, that Unique key is redundant and unnecessary.

Having multiple UNIQUE keys is unusual.

Please provide EXPLAIN SELECT ... for your two queries.

When timing a query, run it twice. The second run may be a lot faster. This may be due to the first run doing a lot of I/O and the second finding all the necessary data cached.

Here's something to try:

SELECT ...
    FROM (your second query) AS a
    JOIN  album_photo_map AS apm  ON apm.album_id = a.album_id
    JOIN  media  AS p ON p.media_vishash = apm.media_vishash
    WHERE ... the rest of the conditions
    GROUP BY  p.datetime_created,      p.media_vishash
    ORDER BY  p.datetime_created DESC, p.media_vishash DESC

I chose to make the GROUP BY match the ORDER BY so as to eliminate an extra sort.

This rewrite will, I think, force it to use the FULLTEXT index first, which seems to be the key to performance. (Notice how the EXPLAIN shows a being used first.)

Other notes:

  • Splitting a date into 3 columns (year, month, day) is usually a bad idea. You will see this if you ever need to do a range test on the date.
  • Using md5s (or UUID or other hashes) for keys can be inefficient. (This comment is not relevant to the Select in question.)
  • In apm, this is redundant: KEY album_id (album_id)
  • Related