Improving slow MariaDB query performance-CodePudding

I have what seems to be a fairly straightforward query, but it is super slow and I would like to improve its performance if I can.

SELECT `contacts`.`unit_id`, `contacts`.`owner_id`, `units`.`description`, 
  `units`.`address`, `owners`.`name`, `owners`.`email`, COUNT(*) AS contact_count
FROM `contacts`
LEFT JOIN `units` ON `contacts`.`unit_id` = `units`.`id`
LEFT JOIN `owners` ON `contacts`.`owner_id` = `owners`.`id`
WHERE `owners.group_id` = 6
  AND `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59'
GROUP BY `units`.`id`
ORDER BY `contact_count` DESC
LIMIT 20;

I'm just trying to get the units with the most contacts in a given date range, and belonging to a certain group of owners.

 ------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ --------------------------------- 
| id   | select_type | table    | type   | possible_keys                                    | key                       | key_len | ref                     | rows | Extra                           |
 ------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ --------------------------------- 
|    1 | SIMPLE      | owners   | ref    | PRIMARY,owners_group_id_foreign                  | owners_group_id_foreign   | 4       | const                   | 1133 | Using temporary; Using filesort |
|    1 | SIMPLE      | contacts | ref    | contacts_checkin_index,contacts_owner_id_foreign | contacts_owner_id_foreign | 4       | appdb.owners.id         | 1145 | Using where                     |
|    1 | SIMPLE      | units    | eq_ref | PRIMARY                                          | PRIMARY                   | 4       | appdb.contacts.unit_id  |    1 |                                 |
 ------ ------------- ---------- -------- -------------------------------------------------- --------------------------- --------- ------------------------- ------ ---------------------------------

As near as I can tell, everything that should be indexed is:

CREATE TABLE `contacts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `owner_id` int(10) unsigned NOT NULL,
  `unit_id` int(10) unsigned NOT NULL,
  `terminal_id` int(10) unsigned NOT NULL,
  `checkin` datetime NOT NULL
  PRIMARY KEY (`id`),
  KEY `contacts_checkin_index` (`checkin`),
  KEY `contacts_unit_id_foreign` (`unit_id`),
  KEY `contacts_terminal_id_foreign` (`terminal_id`),
  KEY `contacts_owner_id_foreign` (`owner_id`),
  CONSTRAINT `contacts_unit_id_foreign` FOREIGN KEY (`unit_id`) REFERENCES `units` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `contacts_terminal_id_foreign` FOREIGN KEY (`terminal_id`) REFERENCES `terminals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `contacts_owner_id_foreign` FOREIGN KEY (`owner_id`) REFERENCES `owners` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=25528530 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

The contacts table currently has about 10 million rows, and this query takes about 4 minutes to run. Is this anything that can be improved significantly or am I just bumping up against limitations of my hardware at this point?

CodePudding user response：

SELECT  sub.unit_id, sub.owner_id, u.`description`, u.`address`,
        sub.name, sub.email,
        sub.contact_count
    FROM  
        ( SELECT  c.`unit_id`, c.`owner_id`,
                  u.`description`, u.`address`,
                  COUNT(*) AS contact_count
            FROM  `contacts` AS c
            JOIN  `owners` AS o  ON c.`owner_id` = o.`id`
            WHERE  `o.group_id` = 6
              AND  c.`checkin` >= '2021-10-01'
              AND  c.`checkin` <  '2021-10-01   INTERVAL 1 MONTH
            GROUP BY  c.`unit_id`
            ORDER BY  `contact_count` DESC
            LIMIT  20 
        ) AS sub
    LEFT JOIN  `units` AS u  ON c.`unit_id` = u.`id`
    ORDER BY  `contact_count` DESC, c.unit_id DESC;

Notes:

I turned it inside out in order to hit units only 20 times.
JOIN owners cannot be LEFT JOIN, so I changed that.
I changed the GROUP BY to avoid using units prematurely.
Possibly the GROUP BY is now redundant.
I changed the date range to make it easier to be generic.
I augmented the ORDER BY to make it deterministic in case of dup counts.
Notice, below, how "composite" indexes can be helpful.

Indexes that may help:

contacts:  INDEX(checkin, unit_id, owner_id)
contacts:  INDEX(owner_id, checkin, unit_id)
owners:  INDEX(group_id, id,  name, email)

When adding those, remove any INDEXes that start with the same columns. Example: contacts: INDEX(checkin)

CodePudding user response：

I haven't tested it, so I can't be sure exactly, but I think the main reason for the slow speed is the number of rows participating in group by.

So, you can try the following method to reduce the number of rows. (Since I can't test it, I'm not sure if this query will run correctly. I'm just trying to show you a way.)

SELECT B.*, `owners`.`name`, `owners`.`email`
FROM (
   SELECT `units`.`id`, MAX(`contacts`.`owner_id`) AS owner_id, `units`.`description`, `units`.`address`, COUNT(*) AS contact_count
   FROM (
        SELECT *
        FROM `contacts`
        WHERE `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59') as A
   LEFT JOIN `units` ON A.`unit_id` = `units`.`id`
   GROUP BY `units`.`id`) AS B
LEFT JOIN `owners` ON B.`owner_id` = `owners`.`id` AND `owners.group_id` = 6
ORDER BY `contact_count` DESC
LIMIT 20

Previously, I have a similar experience that I have to check the number of ad views and page visitors by date and time range, and in this way, the time was reduced.