I am using InnoDB's full text search to retrieve results from 2 tables, photos and albums as follows:
SELECT p.media_extension,
p.media_vishash,
ANY_VALUE(a.title_url) as album_title_url,
ANY_VALUE(p.title_url) as photo_title_url,
p.media_time_created,
p.comments,
p.title,
p.text,
MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
AGAINST ('search query' IN BOOLEAN MODE) as score
FROM albums a,
album_photo_map apm,
media p
WHERE p.media_vishash = apm.media_vishash AND
apm.album_id = a.album_id AND
p.level <= 5 AND
a.level <= 5 AND
MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
AGAINST ('search query' IN BOOLEAN MODE)
GROUP BY p.media_vishash
ORDER BY p.datetime_created DESC, p.media_vishash DESC
executes in 0.80s.
---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ----------------------------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ----------------------------------------------
| 1 | SIMPLE | a | NULL | ALL | PRIMARY,album_id_type,album_level,level | NULL | NULL | NULL | 9147 | 49.99 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | p | NULL | fulltext | media_vishash,search | search | 0 | const | 1 | 33.33 | Using where |
| 1 | SIMPLE | apm | NULL | ref | PRIMARY,album_id,media_vishash | album_id | 130 | lifebox.a.album_id | 26 | 10.00 | Using where; Using index |
---- ------------- ------- ------------ ---------- ----------------------------------------- ---------- --------- -------------------- ------ ---------- ----------------------------------------------
The joining with the albums table really slows down the query. I tried re-writing it without the join as follows, and performance is excellent.
SELECT p.media_extension,
p.media_vishash,
ANY_VALUE(p.title_url) as photo_title_url,
p.media_time_created,
p.comments,
p.title,
p.text,
MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename) AGAINST ('search query' IN BOOLEAN MODE) as score
FROM media p
WHERE MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
AGAINST ('search query' IN BOOLEAN MODE)
executed in 0.01s.
The issue is that I need to retrieve title_url from the albums table, and this is missing in the second query.
Is there a way I can retrieve albums.title_url as in the first query while maintaining the excellent performance of the second, without splitting this into multiple queries? Maybe with a subselect to grab title_url from the albums table?
CREATE tables below, MySQL 8.0.27:
CREATE TABLE `media` (
`media_id` char(22) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`media_sha512` char(128) NOT NULL,
`media_sha256` char(64) NOT NULL,
`media_md5` char(32) NOT NULL,
`media_filepath` varchar(250) NOT NULL,
`media_vishash` char(128) NOT NULL,
`media_filename` varchar(250) NOT NULL,
`media_extension` varchar(10) NOT NULL,
`media_path` varchar(250) NOT NULL,
`media_folder` varchar(250) NOT NULL,
`media_time_lastmod` bigint NOT NULL,
`media_time_created` bigint DEFAULT NULL,
`media_filesize` bigint DEFAULT NULL,
`media_width` int DEFAULT NULL,
`media_height` int DEFAULT NULL,
`media_megapixels` float DEFAULT NULL,
`media_mimetype` varchar(50) DEFAULT NULL,
`media_type` varchar(25) DEFAULT NULL,
`title` varchar(250) NOT NULL,
`title_url` varchar(250) NOT NULL,
`title_url_previous` varchar(250) DEFAULT NULL,
`text` text,
`year` int DEFAULT NULL,
`day` int DEFAULT NULL,
`month` int DEFAULT NULL,
`sidecarimg_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`sidecarimg_md5` char(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`sidecarvideo_filepath` varchar(250) DEFAULT NULL,
`sidecarvideo_md5` char(32) DEFAULT NULL,
`sidecarxmp_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`sidecarxmp_md5` char(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`modified_filepath` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`modified_md5` char(32) DEFAULT NULL,
`modified_time_lastmod` int DEFAULT NULL,
`modified_time_created` int DEFAULT NULL,
`datetime_created` datetime DEFAULT NULL,
`datetime_modified` datetime DEFAULT NULL,
`EXIF_Make` varchar(50) DEFAULT NULL,
`EXIF_LensModel` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`EXIF_LensID` varchar(100) DEFAULT NULL,
`EXIF_LensMount` varchar(100) DEFAULT NULL,
`EXIF_LensFormat` varchar(100) DEFAULT NULL,
`EXIF_Software` varchar(100) DEFAULT NULL,
`EXIF_ByLine` varchar(100) DEFAULT NULL,
`EXIF_Copyright` varchar(100) DEFAULT NULL,
`EXIF_DateTimeOriginal` datetime DEFAULT NULL,
`EXIF_ExposureTime` varchar(100) DEFAULT NULL,
`EXIF_ShutterSpeed` varchar(100) DEFAULT NULL,
`EXIF_FNumber` varchar(100) DEFAULT NULL,
`EXIF_FocalLength` varchar(100) DEFAULT NULL,
`EXIF_FocalLength35mm` varchar(100) DEFAULT NULL,
`EXIF_ISO` varchar(100) DEFAULT NULL,
`EXIF_DisplayProfile` varchar(100) DEFAULT NULL,
`EXIF_Keywords` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`EXIF_Model` varchar(75) DEFAULT NULL,
`EXIF_Orientation` varchar(100) DEFAULT NULL,
`EXIF_ExposureProgram` varchar(100) DEFAULT NULL,
`EXIF_BitsPerSample` varchar(100) DEFAULT NULL,
`EXIF_FlashFire` varchar(100) DEFAULT NULL,
`EXIF_ExposureCompensation` varchar(100) DEFAULT NULL,
`EXIF_HDR` varchar(100) DEFAULT NULL,
`EXIF_ColorTemperature` varchar(25) DEFAULT NULL,
`EXIF_ColorSpace` varchar(20) DEFAULT NULL,
`EXIF_ColorProfileDescription` varchar(50) DEFAULT NULL,
`EXIF_CustomRendered` varchar(100) DEFAULT NULL,
`EXIF_HistorySoftwareAgent` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`EXIF_WhiteBalance` varchar(100) DEFAULT NULL,
`EXIF_Quality` varchar(20) DEFAULT NULL,
`EXIF_SelfTimer` varchar(100) DEFAULT NULL,
`EXIF_Contrast` varchar(100) DEFAULT NULL,
`EXIF_Saturation` varchar(100) DEFAULT NULL,
`EXIF_Sharpness` varchar(100) DEFAULT NULL,
`EXIF_SerialNumber` varchar(100) DEFAULT NULL,
`EXIF_ShutterCount` varchar(100) DEFAULT NULL,
`EXIF_Rating` varchar(100) DEFAULT NULL,
`EXIF_RatingPercent` varchar(100) DEFAULT NULL,
`EXIF_Good` tinyint DEFAULT NULL,
`EXIF_Subject` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`EXIF_Title` varchar(250) DEFAULT NULL,
`EXIF_Description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`EXIF_Location` varchar(250) DEFAULT NULL,
`GPS_Altitude_Ref` varchar(50) DEFAULT NULL,
`GPS_Altitude` varchar(50) DEFAULT NULL,
`GPS_Latitude` varchar(50) DEFAULT NULL,
`GPS_Longitude` varchar(50) DEFAULT NULL,
`GPS_Latitude_Dec` double DEFAULT NULL,
`GPS_Longitude_Dec` double DEFAULT NULL,
`GPS_datetime` datetime DEFAULT NULL,
`is_video` tinyint(1) NOT NULL DEFAULT '0',
`video_duration` varchar(50) DEFAULT NULL,
`video_framerate` varchar(50) DEFAULT NULL,
`video_bitrate` varchar(50) DEFAULT NULL,
`video_resolution` varchar(5) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`video_codec` varchar(50) DEFAULT NULL,
`video_pixfmt` varchar(50) DEFAULT NULL,
`video_profile` varchar(50) DEFAULT NULL,
`video_colorprimary` varchar(50) DEFAULT NULL,
`video_rotation` varchar(5) DEFAULT NULL,
`audio_codec` varchar(50) DEFAULT NULL,
`audio_bitrate` varchar(50) DEFAULT NULL,
`audio_channels` varchar(50) DEFAULT NULL,
`audio_bits_per_sample` varchar(50) DEFAULT NULL,
`audio_sample_rate` varchar(50) DEFAULT NULL,
`comment_list` text,
`tag_list` text,
`timestamp_last_comment` int NOT NULL DEFAULT '0',
`timestamp_last_view` int DEFAULT NULL,
`views` int NOT NULL DEFAULT '0',
`comments` int NOT NULL DEFAULT '0',
`tags` int NOT NULL DEFAULT '0',
`thumbs` json DEFAULT NULL,
`thumbs_datetime_created` datetime DEFAULT NULL,
`level` tinyint NOT NULL DEFAULT '2',
PRIMARY KEY (`media_id`),
UNIQUE KEY `media_vishash` (`media_vishash`) USING BTREE,
UNIQUE KEY `media_sha512` (`media_sha512`) USING BTREE,
KEY `index_rating` (`EXIF_Rating`),
FULLTEXT KEY `search` (`title`,`text`,`tag_list`,`comment_list`,`media_filename`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `album_photo_map` (
`album_id` varchar(32) NOT NULL DEFAULT '',
`media_vishash` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`album_id`,`media_vishash`) USING BTREE,
KEY `album_id` (`album_id`),
KEY `media_vishash` (`media_vishash`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `albums` (
`album_id` varchar(32) NOT NULL DEFAULT '',
`title` varchar(100) NOT NULL DEFAULT '',
`title_url` varchar(100) NOT NULL DEFAULT '',
`location` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT '',
`text` text,
`timestamp_created` int NOT NULL DEFAULT '0',
`timestamp_modified` int DEFAULT '0',
`timestamp_oldest` int DEFAULT NULL,
`timestamp_newest` int DEFAULT NULL,
`num_photos` int NOT NULL DEFAULT '0',
`num_videos` int NOT NULL DEFAULT '0',
`num_items` int NOT NULL DEFAULT '0',
`type` tinyint(1) NOT NULL DEFAULT '1',
`highlight_media_vishash` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`timestamp_last_view` int NOT NULL DEFAULT '0',
`last_viewer` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT '',
`views` int NOT NULL DEFAULT '0',
`min_level` tinyint(1) NOT NULL DEFAULT '2',
`level` tinyint(1) NOT NULL DEFAULT '2',
PRIMARY KEY (`album_id`),
UNIQUE KEY `title` (`title`),
UNIQUE KEY `title_url` (`title_url`),
UNIQUE KEY `album_id_type` (`album_id`,`type`),
KEY `highlight_photo_id` (`highlight_media_vishash`),
KEY `lowest_level` (`min_level`),
KEY `timestamp_created` (`timestamp_created`),
KEY `type` (`type`),
KEY `level` (`level`),
FULLTEXT KEY `title_2` (`title`,`text`,`location`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CodePudding user response:
I tested your first query with EXPLAIN and found that the optimizer wants to reorder the tables so it starts with album_photo_map
and then joins to the primary key of each of the other tables. This means it's not using your fulltext index at all.
So I tested the query using STRAIGHT_JOIN
syntax to override the optimizer's tendency to reorder tables. This also led me to use modern JOIN ... ON
syntax, which you should do anyway.
EXPLAIN SELECT p.media_extension,
p.media_vishash,
ANY_VALUE(a.title_url) as album_title_url,
ANY_VALUE(p.title_url) as photo_title_url,
p.media_time_created,
p.comments,
p.title,
p.text,
MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
AGAINST ('search query' IN BOOLEAN MODE) as score
FROM media p
STRAIGHT_JOIN album_photo_map apm ON p.media_vishash = apm.media_vishash
STRAIGHT_JOIN albums a ON apm.album_id = a.album_id
WHERE
p.level <= 5 AND
a.level <= 5 AND
MATCH (p.title,p.text,p.tag_list,p.comment_list,p.media_filename)
AGAINST ('search query' IN BOOLEAN MODE)
GROUP BY p.media_vishash
ORDER BY p.datetime_created DESC, p.media_vishash DESC
This results in a more favorable optimization. It uses the fulltext index, and then joins to the other two tables by using indexes.
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: fulltext
possible_keys: media_vishash,search
key: search
key_len: 0
ref: const
rows: 1
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: apm
partitions: NULL
type: index
possible_keys: PRIMARY,album_id,media_vishash
key: album_id
key_len: 130
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: a
partitions: NULL
type: eq_ref
possible_keys: PRIMARY,album_id_type,level
key: PRIMARY
key_len: 130
ref: test.apm.album_id
rows: 1
filtered: 100.00
Extra: Using where
We can improve it a bit more. Since the media
table is read first, it needs to do lookups in album_photo_map
by media_vishash
. But that's not the leftmost column of the latter table's primary key. If you reorder the primary key, it will be able to do a more efficient join.
alter table album_photo_map drop primary key,
add primary key(media_vishash,album_id);
CodePudding user response:
PRIMARY KEY (`album_id`),
UNIQUE KEY `album_id_type` (`album_id`,`type`),
In MySQL, a PK is necessarily unique and clustered with the data. Hence, that Unique key is redundant and unnecessary.
Having multiple UNIQUE
keys is unusual.
Please provide EXPLAIN SELECT ...
for your two queries.
When timing a query, run it twice. The second run may be a lot faster. This may be due to the first run doing a lot of I/O and the second finding all the necessary data cached.
Here's something to try:
SELECT ...
FROM (your second query) AS a
JOIN album_photo_map AS apm ON apm.album_id = a.album_id
JOIN media AS p ON p.media_vishash = apm.media_vishash
WHERE ... the rest of the conditions
GROUP BY p.datetime_created, p.media_vishash
ORDER BY p.datetime_created DESC, p.media_vishash DESC
I chose to make the GROUP BY
match the ORDER BY
so as to eliminate an extra sort.
This rewrite will, I think, force it to use the FULLTEXT
index first, which seems to be the key to performance. (Notice how the EXPLAIN
shows a
being used first.)
Other notes:
- Splitting a date into 3 columns (year, month, day) is usually a bad idea. You will see this if you ever need to do a range test on the date.
- Using md5s (or UUID or other hashes) for keys can be inefficient. (This comment is not relevant to the Select in question.)
- In apm, this is redundant: KEY
album_id
(album_id
)