Home > Blockchain >  Slow count query (mariaDB)
Slow count query (mariaDB)

Time:11-01

I have a problem with the speed of query. Simple mysql query, but when I have a lot of records (currently > 1 000 000), the performance is really slow. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using where; Using index. Has anyone any suggestions to speed it up? Slow query:

select sql_no_cache
    count(distinct(`books`.`id`)) as `count`
from `books`
left join `books_genres` use index(categorie_id) on `books_genres`.`book_id` = `books`.`id`
where 1  and `books_genres`.`genre_id` in(307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,1328,1136,1044,1103,1074,1150,1322) and `books`.`is_status` = 'active' and `books`.`master_book` = 'true' 

Result:

 -------- 
| count  |
 -------- 
| 402545 |
 -------- 
1 row in set (2.64 sec)

Explain:

 ------ ------------- -------------- -------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- --------- -------------------------------- --------- -------------------------- 
| id   | select_type | table        | type   | possible_keys                                                                                                                                                                                                              | key          | key_len | ref                            | rows    | Extra                    |
 ------ ------------- -------------- -------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- --------- -------------------------------- --------- -------------------------- 
|    1 | SIMPLE      | books_genres | index  | categorie_id                                                                                                                                                                                                               | categorie_id | 4       | NULL                           | 1866077 | Using where; Using index |
|    1 | SIMPLE      | books        | eq_ref | PRIMARY,is_status,master_book,is_status_master_book,is_status_master_book_indexed,is_status_donor_no_ru_master_book,is_status_indexed,books_idx_is_stat_master_livelib_id,master_book_is_status,is_status_master_book_year | PRIMARY      | 4       | base.books_genres.book_id |       1 | Using where              |
 ------ ------------- -------------- -------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- --------- -------------------------------- --------- -------------------------- 
2 rows in set (0.10 sec)

My tables:

CREATE TABLE `books` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `donor` enum('default','ihavebook','flibusta','litres','book24','labirint','livelib') NOT NULL DEFAULT 'default' COMMENT 'Донор',
  `donor_id` varchar(200) DEFAULT NULL COMMENT 'ID у донора',
  `name` varchar(256) DEFAULT NULL COMMENT 'Название',
  `name_int` text COMMENT 'Оригинальное название',
  `name_alt` text COMMENT 'Альтернативные названия',
  `year` int(11) DEFAULT NULL COMMENT 'Дата выхода',
  `poster` varchar(256) DEFAULT NULL COMMENT 'Путь к изображению',
  `description` text COMMENT 'Описание',
  `rating` double(4,2) NOT NULL DEFAULT '0.00' COMMENT 'Рейтинг',
  `rating_count` int(11) NOT NULL DEFAULT '0' COMMENT 'Сколько проголосовали',
  `view_count` int(11) NOT NULL DEFAULT '0' COMMENT 'Количество просмотров',
  `allowed_fragment` enum('5','10','15','20','30','40','50','60','70') DEFAULT NULL COMMENT 'Разрешенный фрагмент правообладателем (если is_toread = false)',
  `is_status` enum('active','parser','incorrect','extremist','delete','fulldeteled') NOT NULL DEFAULT 'active' COMMENT 'Статус книги',
  `is_toread` enum('true','false') NOT NULL DEFAULT 'false' COMMENT 'Возможность читать онлайн',
  `is_similar` enum('true','false') NOT NULL DEFAULT 'false' COMMENT 'Возможность посмотреть список похожих книг',
  `is_partners` enum('true','false') NOT NULL DEFAULT 'false' COMMENT 'Наличие партнерских ссылок',
  `is_duplicate` enum('true','false') NOT NULL DEFAULT 'false' COMMENT 'Проверка на дубль',
  `is_rightholder_lock` enum('true','false') NOT NULL DEFAULT 'false' COMMENT 'блокировка правообладателя',
  `rating_litres` int(11) DEFAULT NULL COMMENT 'рейтинг Литрес',
  `date_written` date DEFAULT NULL,
  `is_genre_deleted` tinyint(1) DEFAULT NULL,
  `is_description_blocked` enum('false','true') DEFAULT 'false' COMMENT 'не давать парсерам менять описание',
  `translator` varchar(255) DEFAULT NULL COMMENT 'Переводчики',
  `genres_count` tinyint(4) DEFAULT NULL,
  `checked_admin` enum('false','true') DEFAULT 'false',
  `no_ru` enum('false','true') DEFAULT 'false' COMMENT 'не русская версия',
  `count_authors` int(11) DEFAULT NULL,
  `updated_count_authors` enum('false','true') DEFAULT 'false' COMMENT 'Пересчитано количество авторов у книги',
  `checked_fb2` enum('false','true') DEFAULT 'false' COMMENT 'Проверена книга или фрагмент на метатеги',
  `indexed` enum('false','true') DEFAULT 'false' COMMENT 'Проиндексировано ElasticSearch',
  `innodb` enum('false','true') DEFAULT 'false',
  `index_popular` int(11) DEFAULT NULL COMMENT 'Индекс полпулярности',
  `donor_id2` int(11) DEFAULT NULL COMMENT 'дополнительный ID донора',
  `isbn` varchar(50) DEFAULT NULL,
  `index_popular_set` enum('false','true') DEFAULT 'false' COMMENT 'просчитан индекс популярности',
  `storage` tinyint(1) DEFAULT NULL COMMENT 'номер диска для файлов',
  `lng` tinyint(4) DEFAULT NULL COMMENT 'язык книги',
  `google_search` bigint(20) DEFAULT NULL,
  `litres_csv_rate` decimal(5,1) DEFAULT NULL,
  `litres_livelib_csv_reads` int(11) DEFAULT NULL,
  `livelib_csv_rate` decimal(5,1) DEFAULT NULL,
  `livelib_url` varchar(255) DEFAULT NULL,
  `livelib_book_id` int(11) DEFAULT NULL COMMENT 'если минус - наш id, если плюс - ЛЛ',
  `livelib_share_vk` int(11) DEFAULT NULL,
  `livelib_fav_num` int(11) DEFAULT NULL,
  `livelib_read_num` int(11) DEFAULT NULL,
  `livelib_plan_read` int(11) DEFAULT NULL,
  `livelib_comments_num` int(11) DEFAULT NULL,
  `livelib_quotes_num` int(11) DEFAULT NULL,
  `livelib_collections_num` int(11) DEFAULT NULL,
  `master_book` enum('true','false') DEFAULT 'true',
  `age` varchar(20) DEFAULT NULL,
  `num_pages` varchar(50) DEFAULT NULL,
  `old_book_id` int(11) DEFAULT NULL,
  `old_book_url` varchar(255) DEFAULT NULL,
  `checked_year` enum('false','true') DEFAULT 'false' COMMENT 'проверен ли год выпуска модераторами, чтоб отображать в библиографии',
  `checked_pages` enum('false','true') DEFAULT 'false',
  PRIMARY KEY (`id`),
  KEY `is_status` (`is_status`),
  KEY `view_count` (`view_count`),
  KEY `poster` (`poster`(255)),
  KEY `rating` (`rating`),
  KEY `rating_litres` (`rating_litres`),
  KEY `date_written` (`date_written`),
  KEY `is_description_blocked` (`is_description_blocked`),
  KEY `genres_count` (`genres_count`),
  KEY `checked_admin` (`checked_admin`),
  KEY `no_ru` (`no_ru`),
  KEY `count_authors` (`count_authors`),
  KEY `updated_count_authors` (`updated_count_authors`),
  KEY `checked_fb2` (`checked_fb2`),
  KEY `indexed` (`indexed`),
  KEY `count_authors_updated_count_authors` (`count_authors`,`updated_count_authors`),
  KEY `is_genre_deleted_is_status` (`is_genre_deleted`,`is_status`),
  KEY `updated_count_authors_is_status` (`updated_count_authors`,`is_status`),
  KEY `innodb` (`innodb`),
  KEY `index_popular` (`index_popular`),
  KEY `donor_id2` (`donor_id2`),
  KEY `year` (`year`),
  KEY `index_popular_set` (`index_popular_set`),
  KEY `donor` (`donor`),
  KEY `donor_id` (`donor_id`),
  KEY `google_search` (`google_search`),
  KEY `litres_csv_rate` (`litres_csv_rate`),
  KEY `litres_livelib_csv_reads` (`litres_livelib_csv_reads`),
  KEY `livelib_csv_rate` (`livelib_csv_rate`),
  KEY `master_book` (`master_book`),
  KEY `livelib_book_id` (`livelib_book_id`),
  KEY `storage` (`storage`),
  KEY `livelib_read_num` (`livelib_read_num`),
  KEY `old_book_id` (`old_book_id`),
  KEY `old_book_url` (`old_book_url`),
  KEY `checked_year` (`checked_year`),
  KEY `is_status_master_book` (`is_status`,`master_book`),
  KEY `lng` (`lng`),
  KEY `livelib_book_id_master_book` (`livelib_book_id`,`master_book`),
  KEY `is_status_master_book_indexed` (`is_status`,`master_book`,`indexed`),
  KEY `genres_count_master_book` (`genres_count`,`master_book`),
  KEY `count_authors_updated_count_authors_master_book` (`count_authors`,`updated_count_authors`,`master_book`),
  KEY `is_status_donor_no_ru_master_book` (`is_status`,`donor`,`no_ru`,`master_book`),
  KEY `livelib_url_master_book_is_status` (`livelib_url`,`master_book`,`is_status`),
  KEY `donor_donor_id_donor_id2_master_book_is_status` (`donor`,`donor_id`,`donor_id2`,`master_book`,`is_status`),
  KEY `lng_is_status_master_book` (`lng`,`is_status`,`master_book`),
  KEY `is_status_indexed` (`is_status`,`indexed`),
  KEY `books_idx_is_stat_master_livelib_id` (`is_status`,`master_book`,`livelib_read_num`,`id`),
  KEY `books_idx_livelib_num_id` (`livelib_read_num`,`id`),
  KEY `master_book_is_status` (`master_book`,`is_status`),
  KEY `is_status_master_book_year` (`is_status`,`master_book`,`year`),
  FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `books_genres` (
  `book_id` int(11) NOT NULL,
  `genre_id` int(11) NOT NULL,
  `sort` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`book_id`,`genre_id`),
  UNIQUE KEY `book_id` (`book_id`,`genre_id`),
  KEY `categorie_id` (`genre_id`),
  KEY `sort` (`sort`),
  KEY `book_id2` (`book_id`),
  KEY `genre_id_book_id` (`genre_id`,`book_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8';

UPDATE 1 (Its still so long):

select sql_no_cache
    count(`books`.`id`) as `count`
from `books`
use index(is_status_master_book)
where 
    (
        1 = 1 
        AND `books`.`is_status` = 'active' 
        AND `books`.`master_book` = 'true'
    ) 
    AND (
        EXISTS (
            SELECT
                1 
            FROM
                `books_genres` 
            WHERE
                (
                    `books_genres`.`book_id` = `books`.`id`
                ) 
                AND (
                    `books_genres`.`genre_id` IN (307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,1328,1136,1044,1103,1074,1150,1322)
                )
        )
    ) 

RESULT:

 -------- 
| count  |
 -------- 
| 402545 |
 -------- 
1 row (1.774 s)

Explain:

 ------ -------------- -------------- -------- -------------------------------------------------------- ----------------------- --------- ------------- --------- -------------------------- 
| id   | select_type  | table        | type   | possible_keys                                          | key                   | key_len | ref         | rows    | Extra                    |
 ------ -------------- -------------- -------- -------------------------------------------------------- ----------------------- --------- ------------- --------- -------------------------- 
|    1 | PRIMARY      | books        | ref    | is_status_master_book                                  | is_status_master_book | 3       | const,const |  975330 | Using where; Using index |
|    1 | PRIMARY      | <subquery2>  | eq_ref | distinct_key                                           | distinct_key          | 4       | func        |       1 |                          |
|    2 | MATERIALIZED | books_genres | index  | PRIMARY,book_id,categorie_id,book_id2,genre_id_book_id | PRIMARY               | 8       | NULL        | 1866103 | Using where; Using index |
 ------ -------------- -------------- -------- -------------------------------------------------------- ----------------------- --------- ------------- --------- -------------------------- 
3 rows in set (0.00 sec)

UPDATE 2:

select sql_no_cache
    count(1) as `count`
from `books`
use index(is_status_master_book_id)
where 
    (
        1 = 1 
        AND `books`.`is_status` = 'active' 
        AND `books`.`master_book` = 'true'
    ) 
    AND (
        EXISTS (
            SELECT
                1 
            FROM
                `books_genres` 
            WHERE
                (
                    `books_genres`.`book_id` = `books`.`id`
                ) 
                AND (
                    `books_genres`.`genre_id` IN (307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,1328,1136,1044,1103,1074,1150,1322)
                )
        )
    ) 

Rusult:

 -------- 
| count  |
 -------- 
| 402553 |
 -------- 
1 row in set (3.84 sec)

UPDATE 3:

ALTER TABLE books ADD INDEX is_status_master_book_id (is_status, master_book, id);

SELECT sql_no_cache COUNT(id) AS `count`
FROM books
use index(is_status_master_book_id)
WHERE books.is_status = 'active'
   AND books.master_book = 'true' 
   AND id IN (
        SELECT book_id
        FROM books_genres
        WHERE genre_id IN (307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,1328,1136,1044,1103,1074,1150,1322)
    )

Result:

 -------- 
| count  |
 -------- 
| 402553 |
 -------- 
1 row in set (1.47 sec)

CodePudding user response:

First, just to be sure, use ANALYZE TABLE to update the internal statistics used by MariaDB to figure out which indexes to use.

ANALYZE TABLE books, book_genres;

In MariaDB you can use ANALYZE SELECT in place of EXPLAIN SELECT on your queries. ANALYZE actually runs the query then shows the same sort of output as EXPLAIN but with more details. So, use that to do your analysis.

SELECT COUNT(), sad to say, is inherently slow on big tables. If you do this

SELECT COUNT(id) AS `count`
  FROM books
 WHERE books.is_status = 'active'
   AND books.master_book = 'true' 

you'll get a lower bound on (minimum for) the time your query will take. Your index called is_status_master_book will help with this query unless most of your books are 'active' and 'master,' in which case the server chooses a table scan because the index is not selective enough. Get this query working and study the ANALYZE output. (InnoDB indexes always append the primary key to the end of the list of columns.)

(Notice that your is_status index is redundant with your is_status_master_book index.)

Next, work on this shorter version of your genre lookup.

               SELECT book_id
                 FROM books_genres
                WHERE genre_id in (307,380,385,384,1359,381,390,397,394)

This query should use the genre_id_book_id on your books_genres table. Does it work? Does it perform tolerably well? If so, try it again with all the genre_ids.

If its performance gets noticeably worse with the longer list, you might try refactoring your app to put that long list into a temporary table and doing

               SELECT book_id
                 FROM books_genres
                 JOIN temptable ON books_genres.genre_id = temptable.genre_id

As a next step, try this.

SELECT COUNT(id) AS `count`
  FROM books
 WHERE books.is_status = 'active'
   AND books.master_book = 'true' 
   AND id IN (
         SELECT book_id
           FROM books_genres
          WHERE genre_id IN ( short list )
       )

How does this perform? Look at the ANALYZE output. Try using a different index; it may help if id is more selective than is_status and master_book.

ALTER TABLE books ADD INDEX id_status_master (id, is_status, master);

Then try the longer list of genres in the above query. That query is equivalent to your query.

CodePudding user response:

You have some redundant indexes, I would get rid of those first... not that they would have impact on queries, but I'll explain.

If you have an index on (a, b) another index on (b, a), and you run a query qualifying BOTH columns, then either will work. However, if you only need column b as part of a query, have an index on that. The (a, b) would also be used if querying column a anything else.

So your book table indexes of

is_status (is_status) is_status_master_book (is_status, master_book)

the single "is_status" index is redundant. By looking at some of the other indexes, there might be better indexing options but without knowing context of your queries, I'll leave alone.

Now, to help one more level, for this specific query, you probably need to make a COVERING index. This basically means the index has all the columns requested by the query as part of the index, so the engine never needs to go to the individual raw pages of record data to get/qualify the results.

So, I would change your "Is_Status_Master_Book" index to

is_status_master_book (is_status, master_book, id)

Similarly in your book genre table which you have with your unique key 'Book_id' which has both the book ID and genre. But also makes the "genre_id_book_id" redundant, but can leave the categorie_id index as that covers the genre_id if there is ever a genre_id only query being performed.

Now, for your query. You have a left-join, but had an "AND" Genre_id in the where clause making it an INNER JOIN.

    select sql_no_cache
          count( distinct b.id ) as `count`
       from 
          books b
             join books_genres bg
                on b.id = bg.book_id
       where 
              1  
          and b.is_status = 'active' 
          and b.master_book = 'true' 
          and bg.genre_id in (307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,
398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,
370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,
1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,
1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,
1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,
1328,1136,1044,1103,1074,1150,1322) 
        

What is the true intent of your query... Do you want a count of all Active status books that are master books AND they meet one of the genres listed? If so, that is your query (less the left-join portion), but update the index is_status_master_book as suggested to make it covering so it does not need to go to the raw data pages to qualify records and return results.

An alternative would be do pre-query the distinct books within the qualifying genres.

    select sql_no_cache
          count( distinct b.id ) as `count`
       from 
          books b
             join 
             ( select distinct bg.book_id
                  from books_genres bg
                  where bg.genre_id in (307,380,385,384,1359,381,390,397,394,949,1390,1391,403,401,1332,393,
398,1374,1397,402,984,1025,841,1027,359,577,365,1021,1023,360,368,369,
370,942,1061,1348,1376,437,737,1137,1354,1384,1385,1115,1113,1114,1143,
1363,593,581,583,567,978,973,576,677,825,595,826,1043,827,1077,323,324,
1361,1362,1360,407,610,611,1179,608,336,831,1042,520,1079,1078,1081,352,
1349,388,727,728,729,325,330,1099,616,320,1375,1138,1388,1119,1141,1140,
1328,1136,1044,1103,1074,1150,1322) ) PQ
                on b.id = PQ.book_id
       where 
              1  
          and b.is_status = 'active' 
          and b.master_book = 'true' 
  • Related