Home > other >  SQL statement timing out on large data set
SQL statement timing out on large data set

Time:03-01

Trying to complete a web process, I'm getting the error canceling statement due to statement timeout. Debugging the codebase it turns out that the below query is timing out due to large data set. I appreciate any suggestions on how to increase the below query performance.

select userid, max(recent_activity_date) recent_activity_date 
from (
      SELECT id AS userid,
        recent_logged_in AS recent_activity_date
      FROM user
      WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
      UNION
      SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
      FROM tokencreds
      WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
      GROUP BY userid
) recent_activity 
  WHERE EXISTS(select 1 from user where id = userid and not deleted) 
  group by userid 
  order by userid;  

Index per table:

  • Table user: user_recent_logged_in on user (recent_logged_in)
  • Table tokencreds: tokencreds_userid_token on tokencreds (userid, token). tokencreds_userid_token is unique.

CodePudding user response:

A lot depends on the 'layout' of your data. Are there a lot of records 'hit'? Are there a lot of records in users? in tokencreds? etc...

Personally I would go for this:

SELECT userid, max(recent_activity_date) recent_activity_date _user
 FROM (
        SELECT id AS userid, MAX(recent_logged_in) AS recent_activity_date
          FROM user
         WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
           AND NOT deleted
         UNION ALL
        SELECT userid AS userid, MAX(recentaccessed) AS recent_activity_date
          FROM tokencreds
         WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
           AND EXISTS(SELECT * FROM user WHERE id = userid AND NOT deleted)
) recent_activity 
  GROUP BY userid 
  ORDER BY userid; 
  
  
 -- indexes 'needed' on :
 CREATE INDEX idx_userid_not_deleted ON user (userid) WHERE NOT deleted;
 CREATE INDEX idx_recent_logged_in_user_id_not_deleted ON user (recent_logged_in, userid) WHERE not deleted;
 CREATE INDEX idx_recentaccessed_user_id ON tokencreds (recentaccessed, userid);
 

but YMMV. To get a better idea you really should provide the full EXPLAIN ANALYZE result otherwise we're just flying blind and guessing here. Could be the system will refuse to use any of the suggested indexes in which case you better remove then again off course.

Reasoning:

  • the UNION will cause an implicit distinct on the sub-select which you don't really need as te MAX() and GROUP BY later on will pretty much do the same so why do things twice?
  • better to filter 'as soon as possible' rather then filter in the end IMHO (**)

**: Do note that the results here ARE going to be different! (but I think mine are 'better'). E.g. suppose you have 3 records for user_id 5

  • user_id = 5, deleted = true, recent_activity_date = Dec 10
  • user_id = 5, deleted = false, recent_activity_date = Dec 8
  • user_id = 5, deleted = false, recent_activity_date = Dec 5

Ignoring the tokencreds table, the result for user_id 5 will be Dec 10 in your version while in mine it will be Dec 8. Check your requirements on which one you want!

edit: mistake in suggested indexes

CodePudding user response:

Get rid of the union, and the exists(), and combine them into a straight join:


SELECT x.userid
        , GREATEST(x.recent_logged_in, x.recent_activity_date ) AS recent_activity_date
FROM (
      SELECT u.id AS userid
        , u.recent_logged_in
        , MAX(t.recentaccessed) AS recent_activity_date
      FROM users u
      LEFT JOIN tokencreds AS t ON t.userid = u.id
      WHERE NOT u.deleted  
      AND (
        u.recent_logged_in > now() - cast('10 days' AS INTERVAL)
        OR t.recentaccessed > now() - cast('10 days' AS INTERVAL)
        )
      GROUP BY u.id
        ) x
  ORDER by userid;  
  • Related