Home > Software design >  Slow MySQL query on 2mill record table despite indexes
Slow MySQL query on 2mill record table despite indexes

Time:05-15

Struggling to understand why my query is taking ~8-12s to complete when my table has millions of rows. I can force it to use a specific index, and depending on which one it will range from 7s-20s to execute.

Here is the query:

SELECT 
    thread, msgProtocol, MAX(newestMsg) AS newestMsg, 
    SUM(total) AS total, SUM(unread) AS unread

FROM

(SELECT 
    msgSender AS thread, msgProtocol, 
    MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg, 
    COUNT(msgID) AS total, 
    SUM(msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19')) AS unread
FROM tblMsgs
WHERE msgRecipient IN('5555555555', 'chat@eid:19') AND msgStatus != 'D'
GROUP BY msgSender
ORDER BY newestMsg DESC
LIMIT 0, 30) a

UNION ALL

(SELECT 
    msgRecipient AS thread, msgProtocol, 
    MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg, 
    COUNT(msgID) AS total, 
    SUM(msgRead = 0 AND msgRecipient = 'chat@eid:19') AS unread
FROM tblMsgs
WHERE msgSender IN('5555555555', 'chat@eid:19') AND msgStatus != 'D'
GROUP BY msgRecipient
ORDER BY newestMsg DESC
LIMIT 0, 30)

ORDER BY newestMsg DESC
LIMIT 0, 30;

The basic premise of this is a chat/texting application that stores messages where the sender and recipient are in two separate columns. Whether a message is "incoming" or "outgoing" is determined in the application. This query is trying to pull a list of most recently updated "threads" or conversations. This data updates in real-time as one would expect for a chat app, so it's crucial it's quick. The first half of the UNION grabs what are effectively the most recently updated incoming threads, and the second half outgoing. They are then joined together and grouped again so I only see the latest 30 threads.

Here is the structure of the table:

CREATE TABLE `tblMsgs` (
  `dateCreated` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  `msgID` int NOT NULL AUTO_INCREMENT,
  `msgStatus` varchar(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT 'A',
  `msgSender` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `msgRecipient` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `msgProtocol` varchar(25) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT 'sms',
  `msgBody` varchar(3000) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  `msgAttachments` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
  `msgRead` int NOT NULL DEFAULT '0',
  `msgSent` int NOT NULL DEFAULT '0',
  `msgDelivered` int DEFAULT '0',
  `msgSendDate` datetime(3) DEFAULT NULL,
  PRIMARY KEY (`msgID`),
  KEY `recipient` (`msgRecipient`,`msgStatus`),
  KEY `sender` (`msgSender`,`msgStatus`)
) /*!50100 TABLESPACE `innodb_system` */ ENGINE=InnoDB AUTO_INCREMENT=2370188 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Here is the EXPLAIN:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY derived2 ALL 30 100.00
2 DERIVED tblMsgs index recipient,sender sender 1029 1802478 27.01 Using where; Using temporary; Using filesort
3 UNION tblMsgs index recipient,sender recipient 1029 1802478 50.21 Using where; Using temporary; Using filesort
UNION RESULT union1,3 ALL Using temporary; Using filesort

I've also tried indexing msgSendDate/dateCreated alone and together and that doesn't seem to help. I've tried forcing both halves of the UNION to use the index "recipient" and "sender" (respectively) and it doesn't help. I'm not sure why the second half doesn't prefer the "sender" index and runs slower with it?

Even using a really simple statement like this will take ~7s (this chooses the "recipient" index as well, and is even slower @ ~9s with "sender"):

SELECT * FROM tblMsgs 
WHERE msgSender = '5555555555' AND msgStatus <> 'D'
GROUP BY msgRecipient
ORDER BY msgID DESC
LIMIT 0, 30;

EXPLAIN for this:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE tblMsgs index recipient,sender recipient 1029 1802454 50.21 Using where; Using temporary; Using filesort

CodePudding user response:

Based on logic you should not need this union -- the following should give you the same result

SELECT 
  msgSender AS thread,
  msgProtocol, 
  MAX(IF(msgSent = -1, msgSendDate, dateCreated)) AS newestMsg, 
  COUNT(msgID) AS total, 
  SUM(CASE msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19') THEN 1 ELSE 0 END) AS unread
FROM tblMsgs
WHERE (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19') AND msgStatus != 'D'
GROUP BY thread
ORDER BY newestMsg DESC
LIMIT 0, 30;

By changing to 1 query from 3 the optimizer should pick the best indexes and give you the best performance.

Also, if msgSendDate is null when no messages are sent you can change the 3rd column to

  MAX(COALESCE(msgSendDate, dateCreated)) AS newestMsg, 

which would be much faster than an IF

CodePudding user response:

Like @Hogan suggested, you can re-write the query, and do this:

SELECT 
   IF(msgRecipient IN('5555555555', 'chat@eid:19'),msgSender,msgRecipient) AS thread, 
   msgProtocol, 
   MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg, 
   COUNT(msgID) AS total, 
   SUM(msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19')) AS unread
FROM tblMsgs
WHERE (msgRecipient IN('5555555555', 'chat@eid:19') OR msgSender IN('5555555555', 'chat@eid:19'))
  AND msgStatus != 'D'
GROUP BY 1, msgProtocol
ORDER BY newestMsg DESC
LIMIT 0, 30;

see: DBFIDDLE

  • As far as I can see, this will use index (checked by removing the comment before EXPLAIN in the dbfiddle)
  • I cannot tell if it is performant enough, because a reproduceable situation is missing (see Some updates to (TRY TO) reproduce actual data in the dbfiddle)
  • Related