Struggling to understand why my query is taking ~8-12s to complete when my table has millions of rows. I can force it to use a specific index, and depending on which one it will range from 7s-20s to execute.
Here is the query:
SELECT
thread, msgProtocol, MAX(newestMsg) AS newestMsg,
SUM(total) AS total, SUM(unread) AS unread
FROM
(SELECT
msgSender AS thread, msgProtocol,
MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg,
COUNT(msgID) AS total,
SUM(msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19')) AS unread
FROM tblMsgs
WHERE msgRecipient IN('5555555555', 'chat@eid:19') AND msgStatus != 'D'
GROUP BY msgSender
ORDER BY newestMsg DESC
LIMIT 0, 30) a
UNION ALL
(SELECT
msgRecipient AS thread, msgProtocol,
MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg,
COUNT(msgID) AS total,
SUM(msgRead = 0 AND msgRecipient = 'chat@eid:19') AS unread
FROM tblMsgs
WHERE msgSender IN('5555555555', 'chat@eid:19') AND msgStatus != 'D'
GROUP BY msgRecipient
ORDER BY newestMsg DESC
LIMIT 0, 30)
ORDER BY newestMsg DESC
LIMIT 0, 30;
The basic premise of this is a chat/texting application that stores messages where the sender and recipient are in two separate columns. Whether a message is "incoming" or "outgoing" is determined in the application. This query is trying to pull a list of most recently updated "threads" or conversations. This data updates in real-time as one would expect for a chat app, so it's crucial it's quick. The first half of the UNION grabs what are effectively the most recently updated incoming threads, and the second half outgoing. They are then joined together and grouped again so I only see the latest 30 threads.
Here is the structure of the table:
CREATE TABLE `tblMsgs` (
`dateCreated` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`msgID` int NOT NULL AUTO_INCREMENT,
`msgStatus` varchar(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT 'A',
`msgSender` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`msgRecipient` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`msgProtocol` varchar(25) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT 'sms',
`msgBody` varchar(3000) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
`msgAttachments` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`msgRead` int NOT NULL DEFAULT '0',
`msgSent` int NOT NULL DEFAULT '0',
`msgDelivered` int DEFAULT '0',
`msgSendDate` datetime(3) DEFAULT NULL,
PRIMARY KEY (`msgID`),
KEY `recipient` (`msgRecipient`,`msgStatus`),
KEY `sender` (`msgSender`,`msgStatus`)
) /*!50100 TABLESPACE `innodb_system` */ ENGINE=InnoDB AUTO_INCREMENT=2370188 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Here is the EXPLAIN:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | derived2 | ALL | 30 | 100.00 | ||||||
2 | DERIVED | tblMsgs | index | recipient,sender | sender | 1029 | 1802478 | 27.01 | Using where; Using temporary; Using filesort | ||
3 | UNION | tblMsgs | index | recipient,sender | recipient | 1029 | 1802478 | 50.21 | Using where; Using temporary; Using filesort | ||
UNION RESULT | union1,3 | ALL | Using temporary; Using filesort |
I've also tried indexing msgSendDate/dateCreated alone and together and that doesn't seem to help. I've tried forcing both halves of the UNION to use the index "recipient" and "sender" (respectively) and it doesn't help. I'm not sure why the second half doesn't prefer the "sender" index and runs slower with it?
Even using a really simple statement like this will take ~7s (this chooses the "recipient" index as well, and is even slower @ ~9s with "sender"):
SELECT * FROM tblMsgs
WHERE msgSender = '5555555555' AND msgStatus <> 'D'
GROUP BY msgRecipient
ORDER BY msgID DESC
LIMIT 0, 30;
EXPLAIN for this:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | tblMsgs | index | recipient,sender | recipient | 1029 | 1802454 | 50.21 | Using where; Using temporary; Using filesort |
CodePudding user response:
Based on logic you should not need this union -- the following should give you the same result
SELECT
msgSender AS thread,
msgProtocol,
MAX(IF(msgSent = -1, msgSendDate, dateCreated)) AS newestMsg,
COUNT(msgID) AS total,
SUM(CASE msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19') THEN 1 ELSE 0 END) AS unread
FROM tblMsgs
WHERE (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19') AND msgStatus != 'D'
GROUP BY thread
ORDER BY newestMsg DESC
LIMIT 0, 30;
By changing to 1 query from 3 the optimizer should pick the best indexes and give you the best performance.
Also, if msgSendDate is null when no messages are sent you can change the 3rd column to
MAX(COALESCE(msgSendDate, dateCreated)) AS newestMsg,
which would be much faster than an IF
CodePudding user response:
Like @Hogan suggested, you can re-write the query, and do this:
SELECT
IF(msgRecipient IN('5555555555', 'chat@eid:19'),msgSender,msgRecipient) AS thread,
msgProtocol,
MAX(IF(msgSent = -1, msgSendDate, tblMsgs.dateCreated)) AS newestMsg,
COUNT(msgID) AS total,
SUM(msgRead = 0 AND (msgRecipient = '5555555555' OR msgRecipient = 'chat@eid:19')) AS unread
FROM tblMsgs
WHERE (msgRecipient IN('5555555555', 'chat@eid:19') OR msgSender IN('5555555555', 'chat@eid:19'))
AND msgStatus != 'D'
GROUP BY 1, msgProtocol
ORDER BY newestMsg DESC
LIMIT 0, 30;
see: DBFIDDLE
- As far as I can see, this will use index (checked by removing the comment before
EXPLAIN
in the dbfiddle) - I cannot tell if it is performant enough, because a reproduceable situation is missing (see
Some updates to (TRY TO) reproduce actual data
in the dbfiddle)