We have 2 SQL queries:
- Get active campaigns (234 rows, 0.0007 seconds)
SELECT id FROM campaigns WHERE campaigns.is_active = 1
- Get today's clicks for a user (17 rows, 0.0772 seconds)
SELECT id, campaign_id FROM clicks WHERE user_id = 1 AND created > '2022-06-23 00:00:00'
Both are fast and return a small number of rows.
Now I combine both to get active campaigns amount of clicks today for a user:
SELECT count(clicks.id),
campaigns.id
FROM campaigns
LEFT JOIN clicks
ON ( clicks.campaign_id = campaigns.id
AND clicks.user_id = 1
AND clicks.created > '2022-06-23 00:00:00')
WHERE campaigns.is_active = 1
GROUP BY campaigns.id
Returns 234 rows in 8 seconds runtime.
Getting the campaign list (234 rows) takes 0.0007. Getting the clicks list (17 rows) takes 0.0772 seconds. But to assign the 17 clicks to the 234 campaigns suddenly takes 8 seconds? Why is it so slow? How can I fix it?
If I change from LEFT JOIN to INNER JOIN it takes only 0.09 seconds, but it's not the return I need.
The clicks table has around 21m rows with 50k new rows a day. It has a single index on each of these columns: user_id, campaign_id, created
CREATE TABLE:
CREATE TABLE `clicks` (
`id` int(11) UNSIGNED NOT NULL,
`user_id` int(7) UNSIGNED NOT NULL,
`campaign_id` int(11) UNSIGNED NOT NULL,
`created` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
ALTER TABLE `clicks`
ADD PRIMARY KEY (`id`),
ADD KEY `user_id` (`user_id`),
ADD KEY `campaign_id` (`campaign_id`),
ADD KEY `created` (`created`);
CREATE TABLE `campaigns` (
`id` int(11) UNSIGNED NOT NULL,
`is_active` tinyint(4) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
ALTER TABLE `campaigns`
ADD PRIMARY KEY (`id`),
ADD KEY `is_active` (`is_active`);
CodePudding user response:
Ah, the well-known antipattern of a one-column index on every column strikes again.
Try creating a multicolumn covering index on your large clicks table, tuned to your query.
ALTER TABLE clicks
ADD INDEX campaign_created_user (campaign_id, created, user_id);
See how your query performas with that one. If it's OK you're done. If not, drop that index and try this one.
ALTER TABLE clicks
DROP INDEX campaign_created_user,
ADD INDEX user_campaign_created (user_id, campaign_id, created);
Why will one of these help you?
MySQL indexes are BTREE indexes. Simply put, they can be accessed either randomly or in order. So your clicks.user_id = 1 AND clicks.created > '2022-06-23 00:00:00'
conditions random-access the index I suggest to the first eligible row, then scan it sequentially. That's a fast way to satisfy a query.
I suggest a second index in case the left join somehow leads with the user_id
rather than the campaign_id
. EXPLAIN output will let you know what's going on there.
Pro tip Avoid putting indexes on columns unless you know your queries need them. Putting single-column indexes on every column doesn't substitute for correctly designed multicolumn indexes.
CodePudding user response:
My assumption is that you're multiplying the number of the operations needed by alot just because every row of 234 needs to be checked by 17 times with three different types of statements.
Try something like, but I don't know if it's the best suited for your case:
SELECT count(clicks.id),
campaigns.id
FROM campaigns
LEFT JOIN clicks ON ( clicks.campaign_id = campaigns.id )
WHERE campaigns.is_active = 1 AND (clicks.user_id = 1 AND clicks.created > '2022-06-23 00:00:00')
GROUP BY campaigns.id
But consider that 8 seconds with these small numbers is a weird processing time and should be expected way less.
So If you try this query and still get the same result, there could potentially a problem of DB settings of some sort
CodePudding user response:
Better indexes
Query 1:
The existing INDEX(is_active)
is adequate.
Query 2:
INDEX(user_id, created, campaign_id) -- in this order!
and drop the existing INDEX(user_id)
Query 3:
clicks: INDEX(campaign_id, user_id, created) -- note difference
campaigns: INDEX(is_active, id) -- (as above); may not get used; that's OK
and drop the existing INDEX(campaign_id)
This reformulation may run faster; I don't know for sure:
SELECT ( SELECT COUNT(*)
FROM clicks AS cl
WHERE cl.campaign_id = cg.id
AND cl.user_id = 1
AND cl.created > '2022-06-23 00:00:00'
) AS ct,
cg.id
FROM campaigns as cg
WHERE cg.is_active = 1
GROUP BY cg.id
50k new rows a day
That begs for building maintaining a Summary Tables