Home > Blockchain >  Speed up searches on large mysql table using indexes
Speed up searches on large mysql table using indexes

Time:11-21

I have a large MySQL table and when it had a low number of records the searches were fast, however the table now has over 400,000 records and searches take less than a second between 0.60 and 0.75. I've tried using indexes to bring this down to a nearer the 0.10 second or at least lower than 0.60 with no success. This estimate was from using microtime in PHP immediately before and after the SQL query.

This is a snippet of the table structure output from SHOW CREATE TABLE, there are other fields in the table but these aren't used in the SQL search statement.

CREATE TABLE `mytable` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `url` longtext COLLATE utf8_unicode_ci NOT NULL,
  `url_sha512hash` char(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'no hash.',
  `viewdate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq-web_url` (`url_sha512hash`),
  KEY `idx-viewdate` (`viewdate`)
) ENGINE=MyISAM AUTO_INCREMENT=404899 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

This is the SQL statement which takes between 60 and 75 seconds that I would like to speed up.

SELECT `id`, `url` FROM `mytable` USE INDEX(`idx-viewdate`)
  WHERE (`url` LIKE "INSERT URL"
   AND `viewdate` > "2022-11-20 23:23:00")
   OR (`url` LIKE "INSERT SAME URL AS BEFORE/" AND `viewdate` > "INSERT SAME VIEW DATE AS BEFORE")
  ORDER BY `id` DESC;

Output from EXPLAIN

 ------ ------------- --------- ------- --------------- -------------- --------- ------ ------ ---------------------------------------------------- 
| id   | select_type | table   | type  | possible_keys | key          | key_len | ref  | rows | Extra                                              |
 ------ ------------- --------- ------- --------------- -------------- --------- ------ ------ ---------------------------------------------------- 
|    1 | SIMPLE      | mytable | range | idx-viewdate  | idx-viewdate | 4       | NULL |   28 | Using index condition; Using where; Using filesort |
 ------ ------------- --------- ------- --------------- -------------- --------- ------ ------ ---------------------------------------------------- 

CodePudding user response:

Create a single index on url, viewdate, id.

  • This will minimise the rows you Must search to find the matches.

As per comments, use InnoDB, not MyISAM.

Then use this WHERE clause...

WHERE
  viewdate > ?
  AND url IN (?, CONCAT(?,'/'))

LIKE is expensive, and unnecessary if you're looking for exact matches.

IN() is still more expensive than =, but this form makes it clear you're looking for one or two exact matches.

Finally, be VERY certain to use parameterises queries / prepared statements. Do NOT manually substitute strings in to your query using string manipulation in PHP.

EDIT

Or, put the hash column in the index, and hash the two forms of the URL parameters when searching.

ALTER TABLE `mysql` ADD INDEX `idx-visited2` (`url_sha512hash`, `viewdate`, `id`)
WHERE
  viewdate > DATE
  AND url_sha512hash IN (HASHED_URL, HASHED_URL_WITH_SLASH)

CodePudding user response:

What version of MySQL are you running? Do you have control of the server or is it shared hosting?

The output from the following queries would be helpful for context -

SELECT VERSION();
ANALYZE TABLE `mytable`;
SHOW INDEX FROM `mytable`;
SHOW TABLE STATUS LIKE 'mytable';
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

The output of the following query will help with understanding the distribution your url data (may be slow so be cautious if server under load) -

SELECT
    COUNT(*) countAll,
    COUNT(IF(LENGTH(url) <= 32, 1, NULL)) count32,
    COUNT(DISTINCT LEFT(url, 32)) distinct32,
    COUNT(IF(LENGTH(url) <= 48, 1, NULL)) count48,
    COUNT(DISTINCT LEFT(url, 48)) distinct48,
    COUNT(IF(LENGTH(url) <= 64, 1, NULL)) count64,
    COUNT(DISTINCT LEFT(url, 64)) distinct64,
    COUNT(IF(LENGTH(url) <= 80, 1, NULL)) count80,
    COUNT(DISTINCT LEFT(url, 80)) distinct80,
    COUNT(IF(LENGTH(url) <= 96, 1, NULL)) count96,
    COUNT(DISTINCT LEFT(url, 96)) distinct96
FROM mytable;

As suggested by @MatBailie, you should add a composite index on url and viewdate, but do not include id.

When adding an index on TEXT/BLOB columns you must specify the key length (number of chars to be indexed). A longtext column can hold values up to 4GB, which would not be good for an index.

You also need to change the data type for your url column to a varchar. The current maximum length is 237 so bring it down to whatever you are happy with. 256 would give you some headroom but you may be more comfortable with 300.

ALTER TABLE `mytable` 
    CHANGE COLUMN `url` `url` VARCHAR(300) NOT NULL,
    ADD INDEX `idx_url_viewed` (`url`(64), `viewdate`);

Please provide updated EXPLAIN output for your query, after applying the changes.

  • Related