Trying to complete a web process, I'm getting the error canceling statement due to statement timeout
. Debugging the codebase it turns out that the below query is timing out due to large data set. I appreciate any suggestions on how to increase the below query performance.
select userid, max(recent_activity_date) recent_activity_date
from (
SELECT id AS userid,
recent_logged_in AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
UNION
SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
GROUP BY userid
) recent_activity
WHERE EXISTS(select 1 from user where id = userid and not deleted)
group by userid
order by userid;
Index per table:
- Table user:
user_recent_logged_in on user (recent_logged_in)
- Table tokencreds:
tokencreds_userid_token on tokencreds (userid, token)
. tokencreds_userid_token is unique.
CodePudding user response:
A lot depends on the 'layout' of your data. Are there a lot of records 'hit'? Are there a lot of records in users? in tokencreds? etc...
Personally I would go for this:
SELECT userid, max(recent_activity_date) recent_activity_date _user
FROM (
SELECT id AS userid, MAX(recent_logged_in) AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
AND NOT deleted
UNION ALL
SELECT userid AS userid, MAX(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
AND EXISTS(SELECT * FROM user WHERE id = userid AND NOT deleted)
) recent_activity
GROUP BY userid
ORDER BY userid;
-- indexes 'needed' on :
CREATE INDEX idx_userid_not_deleted ON user (userid) WHERE NOT deleted;
CREATE INDEX idx_recent_logged_in_user_id_not_deleted ON user (recent_logged_in, userid) WHERE not deleted;
CREATE INDEX idx_recentaccessed_user_id ON tokencreds (recentaccessed, userid);
but YMMV. To get a better idea you really should provide the full EXPLAIN ANALYZE result otherwise we're just flying blind and guessing here. Could be the system will refuse to use any of the suggested indexes in which case you better remove then again off course.
Reasoning:
- the
UNION
will cause an implicit distinct on the sub-select which you don't really need as teMAX()
andGROUP BY
later on will pretty much do the same so why do things twice? - better to filter 'as soon as possible' rather then filter in the end IMHO (**)
**: Do note that the results here ARE going to be different! (but I think mine are 'better'). E.g. suppose you have 3 records for user_id 5
- user_id = 5, deleted = true, recent_activity_date = Dec 10
- user_id = 5, deleted = false, recent_activity_date = Dec 8
- user_id = 5, deleted = false, recent_activity_date = Dec 5
Ignoring the tokencreds
table, the result for user_id 5 will be Dec 10 in your version while in mine it will be Dec 8. Check your requirements on which one you want!
edit: mistake in suggested indexes
CodePudding user response:
Get rid of the union
, and the exists()
, and combine them into a straight join
:
SELECT x.userid
, GREATEST(x.recent_logged_in, x.recent_activity_date ) AS recent_activity_date
FROM (
SELECT u.id AS userid
, u.recent_logged_in
, MAX(t.recentaccessed) AS recent_activity_date
FROM users u
LEFT JOIN tokencreds AS t ON t.userid = u.id
WHERE NOT u.deleted
AND (
u.recent_logged_in > now() - cast('10 days' AS INTERVAL)
OR t.recentaccessed > now() - cast('10 days' AS INTERVAL)
)
GROUP BY u.id
) x
ORDER by userid;