Home > Mobile >  2 simple SQL queries become slow when merged
2 simple SQL queries become slow when merged

Time:06-24

We have 2 SQL queries:

  1. Get active campaigns (234 rows, 0.0007 seconds)
SELECT  id FROM   campaigns WHERE  campaigns.is_active = 1
  1. Get today's clicks for a user (17 rows, 0.0772 seconds)
SELECT id, campaign_id FROM clicks WHERE user_id = 1 AND created > '2022-06-23 00:00:00'

Both are fast and return a small number of rows.

Now I combine both to get active campaigns amount of clicks today for a user:

SELECT count(clicks.id),
       campaigns.id
FROM   campaigns
       LEFT JOIN clicks
              ON ( clicks.campaign_id = campaigns.id
                   AND clicks.user_id = 1
                   AND clicks.created > '2022-06-23 00:00:00')
WHERE  campaigns.is_active = 1
GROUP  BY campaigns.id

Returns 234 rows in 8 seconds runtime.

Getting the campaign list (234 rows) takes 0.0007. Getting the clicks list (17 rows) takes 0.0772 seconds. But to assign the 17 clicks to the 234 campaigns suddenly takes 8 seconds? Why is it so slow? How can I fix it?

If I change from LEFT JOIN to INNER JOIN it takes only 0.09 seconds, but it's not the return I need.

The clicks table has around 21m rows with 50k new rows a day. It has a single index on each of these columns: user_id, campaign_id, created

CREATE TABLE:

CREATE TABLE `clicks` (
  `id` int(11) UNSIGNED NOT NULL,
  `user_id` int(7) UNSIGNED NOT NULL,
  `campaign_id` int(11) UNSIGNED NOT NULL,
  `created` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

ALTER TABLE `clicks`
  ADD PRIMARY KEY (`id`),
  ADD KEY `user_id` (`user_id`),
  ADD KEY `campaign_id` (`campaign_id`),
  ADD KEY `created` (`created`);
  
  
  

CREATE TABLE `campaigns` (
    `id` int(11) UNSIGNED NOT NULL,
    `is_active` tinyint(4) NOT NULL DEFAULT 0
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
  
ALTER TABLE `campaigns`
    ADD PRIMARY KEY (`id`),
    ADD KEY `is_active` (`is_active`);

EXPLAIN SELECT: explain select

CodePudding user response:

Ah, the well-known antipattern of a one-column index on every column strikes again.

Try creating a multicolumn covering index on your large clicks table, tuned to your query.

ALTER TABLE clicks
 ADD INDEX campaign_created_user (campaign_id, created, user_id);

See how your query performas with that one. If it's OK you're done. If not, drop that index and try this one.

ALTER TABLE clicks
 DROP INDEX campaign_created_user,
 ADD INDEX user_campaign_created (user_id, campaign_id, created);

Why will one of these help you?

MySQL indexes are BTREE indexes. Simply put, they can be accessed either randomly or in order. So your clicks.user_id = 1 AND clicks.created > '2022-06-23 00:00:00' conditions random-access the index I suggest to the first eligible row, then scan it sequentially. That's a fast way to satisfy a query.

I suggest a second index in case the left join somehow leads with the user_id rather than the campaign_id. EXPLAIN output will let you know what's going on there.

Pro tip Avoid putting indexes on columns unless you know your queries need them. Putting single-column indexes on every column doesn't substitute for correctly designed multicolumn indexes.

CodePudding user response:

My assumption is that you're multiplying the number of the operations needed by alot just because every row of 234 needs to be checked by 17 times with three different types of statements.

Try something like, but I don't know if it's the best suited for your case:

SELECT count(clicks.id),
       campaigns.id
FROM   campaigns
       
LEFT JOIN clicks ON ( clicks.campaign_id = campaigns.id )

WHERE  campaigns.is_active = 1 AND (clicks.user_id = 1 AND clicks.created > '2022-06-23 00:00:00')
GROUP  BY campaigns.id

But consider that 8 seconds with these small numbers is a weird processing time and should be expected way less.

So If you try this query and still get the same result, there could potentially a problem of DB settings of some sort

CodePudding user response:

Better indexes

Query 1:

The existing INDEX(is_active) is adequate.

Query 2:

INDEX(user_id, created, campaign_id)  -- in this order!

and drop the existing INDEX(user_id)

Query 3:

clicks:  INDEX(campaign_id, user_id, created)  -- note difference
campaigns:  INDEX(is_active, id)   -- (as above); may not get used; that's OK

and drop the existing INDEX(campaign_id)

This reformulation may run faster; I don't know for sure:

SELECT  ( SELECT  COUNT(*)
            FROM  clicks AS cl
            WHERE  cl.campaign_id = cg.id
              AND  cl.user_id = 1
              AND  cl.created > '2022-06-23 00:00:00' 
        ) AS ct,
        cg.id
    FROM  campaigns as cg
    WHERE  cg.is_active = 1
    GROUP BY  cg.id 

50k new rows a day

That begs for building maintaining a Summary Tables

  • Related