So my colleague created this query which will run every hour on a table with 500K records.
Delete from table where timestamp> now() - interval 24 hour
I am having a feeling that this would be slower as it is computing time at each row, am I right? How can I optimize it?
Update
With 2.8 Million records it took around 12 seconds to delete the matched rows.
CodePudding user response:
I am having a feeling that this would be slower as it is computing time at each row, am I right?
No, the time calculation is done once at the start of the query. It is a constant value for the duration of the query.
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_now says:
NOW()
returns a constant time that indicates the time at which the statement began to execute.
https://dev.mysql.com/doc/refman/8.0/en/where-optimization.html says:
Constant expressions used by indexes are evaluated only once.
You also asked:
How can I optimize it?
The easiest thing to do is make sure there is an index on the timestamp
column.
A different solution is to use partitioning by the timestamp
column, and drop 1 partition per day. This blog has a description of this solution: http://mysql.rjweb.org/doc.php/partitionmaint
CodePudding user response:
- run the query more frequently (say, hourly)
- have an index on that column
PARTITION BY RANGE
and useDROP PARTITION
; suggest by hour; see Partition- More tips: http://mysql.rjweb.org/doc.php/deletebig