I have what seems to be a fairly straightforward query, but it is super slow and I would like to improve its performance if I can.
SELECT `contacts`.`unit_id`, `contacts`.`owner_id`, `units`.`description`,
`units`.`address`, `owners`.`name`, `owners`.`email`, COUNT(*) AS contact_count
FROM `contacts`
LEFT JOIN `units` ON `contacts`.`unit_id` = `units`.`id`
LEFT JOIN `owners` ON `contacts`.`owner_id` = `owners`.`id`
WHERE `owners.group_id` = 6
AND `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59'
GROUP BY `units`.`id`
ORDER BY `contact_count` DESC
LIMIT 20;
I'm just trying to get the units with the most contacts in a given date range, and belonging to a certain group of owners.
------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ ---------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ ---------------------------------
| 1 | SIMPLE | owners | ref | PRIMARY,owners_group_id_foreign | owners_group_id_foreign | 4 | const | 1133 | Using temporary; Using filesort |
| 1 | SIMPLE | contacts | ref | contacts_checkin_index,contacts_owner_id_foreign | contacts_owner_id_foreign | 4 | appdb.owners.id | 1145 | Using where |
| 1 | SIMPLE | units | eq_ref | PRIMARY | PRIMARY | 4 | appdb.contacts.unit_id | 1 | |
------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ ---------------------------------
As near as I can tell, everything that should be indexed is:
CREATE TABLE `contacts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`owner_id` int(10) unsigned NOT NULL,
`unit_id` int(10) unsigned NOT NULL,
`terminal_id` int(10) unsigned NOT NULL,
`checkin` datetime NOT NULL
PRIMARY KEY (`id`),
KEY `contacts_checkin_index` (`checkin`),
KEY `contacts_unit_id_foreign` (`unit_id`),
KEY `contacts_terminal_id_foreign` (`terminal_id`),
KEY `contacts_owner_id_foreign` (`owner_id`),
CONSTRAINT `contacts_unit_id_foreign` FOREIGN KEY (`unit_id`) REFERENCES `units` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `contacts_terminal_id_foreign` FOREIGN KEY (`terminal_id`) REFERENCES `terminals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `contacts_owner_id_foreign` FOREIGN KEY (`owner_id`) REFERENCES `owners` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=25528530 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The contacts
table currently has about 10 million rows, and this query takes about 4 minutes to run. Is this anything that can be improved significantly or am I just bumping up against limitations of my hardware at this point?
CodePudding user response:
SELECT sub.unit_id, sub.owner_id, u.`description`, u.`address`,
sub.name, sub.email,
sub.contact_count
FROM
( SELECT c.`unit_id`, c.`owner_id`,
u.`description`, u.`address`,
COUNT(*) AS contact_count
FROM `contacts` AS c
JOIN `owners` AS o ON c.`owner_id` = o.`id`
WHERE `o.group_id` = 6
AND c.`checkin` >= '2021-10-01'
AND c.`checkin` < '2021-10-01 INTERVAL 1 MONTH
GROUP BY c.`unit_id`
ORDER BY `contact_count` DESC
LIMIT 20
) AS sub
LEFT JOIN `units` AS u ON c.`unit_id` = u.`id`
ORDER BY `contact_count` DESC, c.unit_id DESC;
Notes:
- I turned it inside out in order to hit
units
only 20 times. JOIN owners
cannot beLEFT JOIN
, so I changed that.- I changed the
GROUP BY
to avoid usingunits
prematurely. - Possibly the
GROUP BY
is now redundant. - I changed the date range to make it easier to be generic.
- I augmented the
ORDER BY
to make it deterministic in case of dup counts. - Notice, below, how "composite" indexes can be helpful.
Indexes that may help:
contacts: INDEX(checkin, unit_id, owner_id)
contacts: INDEX(owner_id, checkin, unit_id)
owners: INDEX(group_id, id, name, email)
When adding those, remove any INDEXes
that start with the same columns. Example: contacts: INDEX(checkin)
CodePudding user response:
I haven't tested it, so I can't be sure exactly, but I think the main reason for the slow speed is the number of rows participating in group by.
So, you can try the following method to reduce the number of rows. (Since I can't test it, I'm not sure if this query will run correctly. I'm just trying to show you a way.)
SELECT B.*, `owners`.`name`, `owners`.`email`
FROM (
SELECT `units`.`id`, MAX(`contacts`.`owner_id`) AS owner_id, `units`.`description`, `units`.`address`, COUNT(*) AS contact_count
FROM (
SELECT *
FROM `contacts`
WHERE `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59') as A
LEFT JOIN `units` ON A.`unit_id` = `units`.`id`
GROUP BY `units`.`id`) AS B
LEFT JOIN `owners` ON B.`owner_id` = `owners`.`id` AND `owners.group_id` = 6
ORDER BY `contact_count` DESC
LIMIT 20
Previously, I have a similar experience that I have to check the number of ad views and page visitors by date and time range, and in this way, the time was reduced.