Home > Software design >  Does deleting rows from table effect on db performance?
Does deleting rows from table effect on db performance?

Time:11-15

As a MySQL database user,

I'm working on a script using MySQL database with an auto-increment primary key tables, that users may need to remove (lots of) data rows as mistaken, duplicated, canceled data and so on.

for now, I use a tinyint last col as 'delete' for each table and update the rows to delete=1 instead of deleting the row.

considering the deleted data as not important data, which way do you suggest to have a better database and performance? does deleting (maybe lots of) rows every day affect select queries for large tables? is it better to delete the rows instantly? or keep the rows using 'delete' col and delete them for example monthly then re-index the data?

I've searched about this but most of the results were based on personal opinions or preferred ones and not referenced or tested data. thanks.

CodePudding user response:

What percentage of the table is "deleted"?

If it is less than, say, 20%, it would be hard to measure any difference between a soft "deleted=1" and a hard "DELETE FROM tbl". The disk space would probably be the same. A 16KB block would either have soft-deleted rows to ignore, or the block would be not "full".

Let's say 80% of the rows have been deleted. Now there are some noticeable differences.

In the "soft-delete" case, a SELECT will be looking at 5 rows to find only 1. While this sounds terrible, it does not translate into 5 times the effort. There is overhead for fetching a block; if it contains 4 soft-deleted rows and 1 useful row, that overhead is shared. Once a useful row is found, there is overhead to deliver that row to the client, but that applies only to the 1 row.

In the "hard-delete" case, blocks are sometimes coalesced. That is, when two "adjacent" blocks become less than half full, they may be combined into a single block. (Or so the documentation says.) This helps to cut down on the number of blocks that need to be touched. But it does not shrink the disk space -- hard-deleted rows leave space that can be reused; deleted blocks can be reused. Blocks are not returned to the OS.

A "point-query" is a SELECT where you specify exactly the row you want (eg, WHERE id = 123). That will be very fast with either type of delete. The only possible change is if the BTree is a different depth. But even if 80% of the rows are deleted, the BTree is unlikely to change in depth. You need to get to about 99% deleted before the depth changes. (A million rows has a depth of about 3; 100M -> 4.)

"Range queries (eg, WHERE blah BETWEEN ... AND ...) will notice some degradation if most are soft-deleted -- but, as already mentioned, there is a slight degradation in either deletion method.

So, is this my "opinion"? Yes. But it is based on an understanding of how InnoDB tables work. And it is based on "experience" in the sense that I have detected nothing to significantly shake this explanation in about 19 years of using InnoDB.

Further... With hard-delete, you have the option of freeing up the free space with OPTIMIZE TABLE. But I have repeatedly said "don't bother" and elaborated on why.

On the other hand, if you need to delete a big chunk of a table (either one-time or repeatedly), see my blog on efficient techniques: http://mysql.rjweb.org/doc.php/deletebig

CodePudding user response:

The simple answer is no. Because DBMS systems are already designed to make changes at any time but system performance is important. Sometimes it's will affect a little bit. But no need to care it

  • Related