I have the following query to select users who have posted at least once each week for the past 4 weeks:
SELECT DISTINCT(user_id) FROM `posts` WHERE
user_id IN (SELECT DISTINCT(user_id) FROM `posts` WHERE post_date > ((UNIX_TIMESTAMP()- (604800*0) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*0))) AND
user_id IN (SELECT DISTINCT(user_id) FROM `posts` WHERE post_date > ((UNIX_TIMESTAMP()- (604800*1) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*1))) AND
user_id IN (SELECT DISTINCT(user_id) FROM `posts` WHERE post_date > ((UNIX_TIMESTAMP()- (604800*2) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*2))) AND
user_id IN (SELECT DISTINCT(user_id) FROM `posts` WHERE post_date > ((UNIX_TIMESTAMP()- (604800*3) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*3)))
If I run the query with just 1 of any of those 4 sub-queries, it is fast.
If I run any of those 4 sub-queries individually, they are fast.
However, the moment I add more than 1 of those subqueries together, it crashes MySQL.
How can I fix this so it works considering the individual parts are fast?
(I should add that in this particular case, I cannot add any more indexes, so the solution should not involve indexing).
CodePudding user response:
Try to EXPLAIN <query>
to see the execution plan, each subquery could read many rows of data.
But I suggest to re-write the query for this particular purpose, it needs to GROUP BY
user and COUNT DISTINCT
weeks as 4, read the table once is sufficient.
SELECT user_id
FROM `posts`
WHERE post_date > ((UNIX_TIMESTAMP()- (604800*3) ) - 604800)
GROUP BY user_id
HAVING COUNT(DISTINCT
(CASE WHEN post_date > ((UNIX_TIMESTAMP()- (604800*0) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*0)) THEN 1
WHEN post_date > ((UNIX_TIMESTAMP()- (604800*1) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*1)) THEN 2
WHEN post_date > ((UNIX_TIMESTAMP()- (604800*2) ) - 604800) AND post_date <= (UNIX_TIMESTAMP()- (604800*2)) THEN 3
ELSE 4 END)) = 4
About the query
WHERE post_date > ((UNIX_TIMESTAMP()- (604800*3) ) - 604800)
filter applicable data onlyCASE WHEN post_date > ((UNIX_TIMESTAMP ...
transform thepost_date
into week 1, 2, 3 or 4
Now if there are data like below
--------- ------
| user_id | week |
--------- ------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
--------- ------
There are four rows for each user_id
, but user 1 appears in week 1,2,3, COUNT DISTINCT week
is 3, user 2 will count 4
CodePudding user response:
I think you may be able to do this in one pass through all those posts
rows. Here's a suggestion.
I assume your post_date
column is some kind of number, not a DATETIME or TIMESTAMP.
First, we need an expression to determine how many weeks ago each post occurred. That's this. It yields an integer. 0 for posts less than a week old, 1 for posts between one and two weeks old etc.
TRUNCATE( (UNIX_TIMESTAMP() - date_time ) / 608400, 0)
Next we need a subquery to count posts and users by week.
SELECT
COUNT (*) post_count,
user_id,
TRUNCATE((UNIX_TIMESTAMP() - date_time ) / 608400, 0) age
FROM posts
WHERE date_time >= UNIX_TIMESTAMP() - (4 * 608400)
AND date_time <= UNIX_TIMESTAMP()
GROUP BY user_id, age
This result set contains one row for each user_id and week.
Finally, summarize the subquery and use HAVING to choose user_id values that show up in it four times.
SELECT user_id
FROM ( /* that subquery goes here */ ) ages
GROUP BY user_id
HAVING COUNT(*) = 4
This is efficient because it can gather all four weeks' work of data from the posts
table in one scan. If you happen to have an index on (date_time, user_id)
MySql does it with an index scan and it will be surprisingly fast.
It's also sweet if your requirement changes to, say, six weeks. Change the 4
constants to 6
and you can use the same query.
Putting it together:
SELECT user_id FROM (
SELECT
COUNT (*) post_count,
user_id,
TRUNCATE((UNIX_TIMESTAMP() - date_time) / 608400, 0) age
FROM posts
WHERE date_time >= UNIX_TIMESTAMP() - (4 * 608400)
AND date_time <= UNIX_TIMESTAMP()
GROUP BY user_id, age
) ages
GROUP BY user_id
HAVING COUNT(*) = 4