Home > Mobile >  MySQL: DROP PARTITION vs DELETE FROM
MySQL: DROP PARTITION vs DELETE FROM

Time:05-28

For MySQL;

What's the difference between a DROP PARTITION vs a DELETE WHERE query? Which to use when?

My Scenario:

is the simple matter of deleting data older than a month from a few of my tables, at the end of every month. Tables fill at the slow rate of around 5 entries every second.

Pros / Cons

PARTITIONing with InnoDB requires me to disable my FOREIGN KEYs. So, DELETEing seems better for me. What, if any, advantages would PARTITIONing give me? Is it only the query's execution speed, a.k.a. performance? My deletion query would run only once every month so I don't see a problem with execution time.

CodePudding user response:

For what it's worth, dropping a partition is a data definition language statement. It happens quickly.

DELETE is a data manipulation statement. If you use InnoDB (you probably do) it's transactional. What does that mean?

When you issue the statement, for example,

DELETE FROM tbl WHERE start_date < CURDATE() - INTERVAL 1 MONTH

it means that other programs accessing your database will see either all the rows you're deleting, before your DELETE transaction, or none of them. The operation is called atomic or indivisible -- it appears to happen all at once.

If you delete many rows, this can put a big burden on your database server. It has to accumulate a transaction log containing all the deleted rows, then commit the transaction log all at once, possibly locking out other access.

Your question says you must delete about 13 megarows each month. If you do that with just one DELETE statement, you will put a heavy burden on your database. To reduce the burden when deleting unwanted historical rows, do the DELETE operation in chunks. How? You have a couple of choices.

  • @Akina suggested this: Do the deletion often enough that you don't delete a large number of rows at once, or

  • Do the deletion in batches of 1000 rows with a statement like this:

      DELETE FROM tbl 
       WHERE start_date < CURDATE() - INTERVAL 1 MONTH
       LIMIT 1000;
    

    and repeat the statement until it deletes no rows.

CodePudding user response:

"5 entries every second" = about 400K/day or 13M/month

DELETING 3M rows in a single statement:

  • Very slow for that many rows. (Not bad for under 1K rows)
  • Blocks most activity on the table
  • Builds a very big list of things for potential "rollback" (in case of power failure)

Scheduled DELETE

  • Why wait for the end of the month? Do up to 1000 every minute; that will keep up with much less overhead. Be sure to have a suitable index, else it won't work efficiently.
  • Rather than a monthly task, have a separate task that is continually running, deleting up to 200 rows, then moving on to the next table; eventually repeating. (If it is not keeping up, increase the "LIMIT 200"; if it is too invasive, add a SLEEP in the loop.)
  • Do not use cron or EVENT to schedule the delete. If, for whatever reason, a Delete run fails to finish before the next Delete, the job could become a train wreck. OTOH, a continually-running job needs a "keep-alive" task to restart it if it dies for any unforeseen reason.

DROP PARTITION

  • Because of how PARTITIONs are implemented as separate 'tables', DROP PARTITION is similar to DROP TABLE.
  • Very fast, regardless of the number of rows in the partition. (Well, the OS may show a slight sluggishness for huge files.)
  • Easy to do if using PARTITION BY RANGE(..).
  • I recommend that the number of partitions be between 20 and 50; adjust the deletion frequency accordingly. (1-month retention --> daily partitions; 3-month retention --> weekly partitions; 1-year retention --> monthly or weekly; etc.)
  • When partitioning a table, rethink all the indexes. You may be able to improve a few queries by making use of partition pruning. (But don't expect much.)
  • More info: Partition
  • PARTITIONing conflicts with FOREIGN KEYS and some UNIQUE keys. This puts a burden on the programmer to worry about (or ignore) the loss of those constraints.

Here's my blog on other big deletions techniques

  • Related