Home > Software design >  How should i properly index the mysql column when dealing with sort?
How should i properly index the mysql column when dealing with sort?

Time:05-18

I have a log table, but I find it become very slow when I sort it.

Here's my database table structure in short.

CREATE TABLE `webhook_logs` (
  `ID` bigint(20) UNSIGNED NOT NULL,
  `event_id` bigint(20) UNSIGNED DEFAULT NULL,
  `object_id` bigint(20) UNSIGNED DEFAULT NULL,
  `occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
  `payload` text COLLATE utf8mb4_unicode_520_ci,
  `priority` bigint(1) UNSIGNED DEFAULT NULL,
  `status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

ALTER TABLE `webhook_logs`
  ADD PRIMARY KEY (`ID`),
  ADD KEY `event_id` (`event_id`),
  ADD KEY `object_id` (`object_id`),
  ADD KEY `occurred_at` (`occurred_at`),
  ADD KEY `priority` (`priority`),
  ADD KEY `status` (`status`);

There are 5M records.

When I do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100

, it took about 5 seconds to get the records.


However, when i remove the sorting, and just do

SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100

, it took only 0.0022 seconds.


I've been playing around with the index and see if the time improved, but with no luck. I wonder if I'm doing something wrong here.

I tried creating combo index with "occurred_at" and "priority", or combo index with all "occurred_at", "priority" and "status". None of them improved the speed, still take around 5 seconds. If any help, there server is running MYSQL 5.7.12.

Any help will be appropriated. Thanks.

CodePudding user response:

A multi column index may work here, but the one you used does not really cover any entire part of the query pipeline. Consider using this version:

CREATE INDEX idx ON webhook_logs (occurred_at, status, priority);

This index covers both the WHERE and ORDER BY clauses. It does not cover the SELECT, but that may not matter if MySQL still decides to use this index. Note that the ordering/sorting step will occur after the filter in the WHERE clause, hence priority should appear last in the index.

CodePudding user response:

Pure index can't solve your problem. In your query, the DB must first find out all records where "occurred_at < 1652838913000" and then sort them to get the records with highest priority. No index can help to reduce the sort.

But there are solutions to your problem, because priority always has only serveral values. You can create an index(status, priority, occurred_at), and then write a query like this:

select * from (
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=1 AND occurred_at < 1652838913000 LIMIT 100)
union
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=2 AND occurred_at < 1652838913000 LIMIT 100)
) a ORDER BY priority asc LIMIT 100

In this query, DB will use the index to do each sub query of the union, and then sort only very few rows.

  • Related