Home > Software engineering >  Simple JOIN slow, very fast with STRAIGHT_JOIN but not for all cases
Simple JOIN slow, very fast with STRAIGHT_JOIN but not for all cases

Time:11-23

To place a video in one or more categories I have the following tables:

  • Video (id, url, title, viewCount) ~1,000,000 rows
  • VideoCategory (id, videoId, categoryId ~6,000,000 rows
  • Category (id, name) ~200 rows

Index on VideoCategory(categoryId, videoId)
Unique Index on Category(name)

The following query, to get 10 most viewed videos in the 'Cars' category is too slow (~5.5 sec). The 'Cars' category contains 200,000 videos.

SELECT v.* FROM Video v
  JOIN VideoCategory vc ON vc.videoId = v.id
  JOIN Category c ON vc.categoryId = c.id
  WHERE c.name = 'Cars' 
  ORDER BY v.viewCount DESC
LIMIT 10

When querying a category that only contains 100 videos it takes ~0.05 sec.

EXPLAIN of query for 'Cars' category.

 ------ ------------- ------- -------- ------------------------------------ -------------------- --------- ----------------------- -------- ---------------------------------------------- 
| id   | select_type | table | type   | possible_keys                      | key                | key_len | ref                   | rows   | Extra                                        |
 ------ ------------- ------- -------- ------------------------------------ -------------------- --------- ----------------------- -------- ---------------------------------------------- 
|    1 | SIMPLE      | c     | const  | name_UNIQUE                        | name_UNIQUE        | 322     | const                 | 1      | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | vc    | ref    | fk_Category_idx,category_video_idx | category_video_idx | 8       | const                 | 493988 | Using index                                  |
|    1 | SIMPLE      | v     | eq_ref | PRIMARY                            | PRIMARY            | 8       | VideoDB.vc.videoId    | 1      | Using where                                  |
 ------ ------------- ------- -------- ------------------------------------ -------------------- --------- ----------------------- -------- ---------------------------------------------- 

How can I speed it up (hopefully 0.1 sec. or less)? Leaving out the ORDER BY has a massive impact but it obviously doesn't give me the result I want.

Update
I played around with STRAIGHT_JOIN and observed some interesting things with the following query:

SELECT v.* FROM Video v
STRAIGHT_JOIN VideoCategory vc ON v.id = vc.videoId
WHERE vc.categoryId = (SELECT id FROM Category WHERE name = 'Cars')
ORDER BY v.viewCount ASC
LIMIT 10

It returns in 0.011 sec! I also removed the JOIN Category c and replaced it with WHERE vc.categoryId = (SELECT id FROM Category WHERE name = 'Cars') which returns immediately with 0 results if the Category.name doesn't exist.

Unfortunately just throwing in a STRAIGHT_JOIN doesn't fix everything, it is only fast(er) for categories with ~1000 videos, the more videos the faster it seems. For a category containing less than 100 videos it gets extremely slow, taking out the STRAIGHT_JOIN makes it very fast again.

For a query this simple I expected the planner to find the optimal path. What's going on here?

Another observation is that changing the order from ASC to DESC will make the query slower again for some categories but not for other? (e.g. ASC will take 0.01 sec. and DESC will take 0.8 sec.)

CodePudding user response:

That order by is going to cost a lot for 200k records. One idea is to create a top 10 table that gets updated periodically, so you can just show entries from the new top 10 table which would be way quicker for the user and easier on the db load.

CodePudding user response:

The optimal resolution of your origin query is to lookup the Category.id for "cars" which is done.

The query plan then falls over due to the lack of indexing on VideoCategory.

As I assume lookups are from categoryId -> videoId there should be a categoryId index. But ideally as a joining table (categoryId, videoId) is probably the most suitable composite primary key that would improve your query. If there reverse resolution from videoId -> categoryId is required, a secondary key videoId. The primary key is already included at the end of secondary keys.

The last problem of viewCount is a hard one. Some query optimization like rowid_filter might help.

  • Related