Home > OS >  Does deleting the same data multiple times have a performance impact on a Cassandra cluster?
Does deleting the same data multiple times have a performance impact on a Cassandra cluster?

Time:02-03

If I need to perform an automated housekeeping task, and this is my query: delete from sample_table where id = '1'

And, this scheduled query gets executed from multiple service instances. Will this have a significant performance impact? What would be an appropriate way of testing this?

CodePudding user response:

Issuing multiple deletes for the same partition can have a significant impact on your cluster.

Remember that all writes in Cassandra (INSERT, UPDATE, DELETE) are inserts under the hood. Since Cassandra does not perform a read-before-write (with the exception of lightweight transactions), issuing a DELETE will insert a tombstone marker regardless of whether the data exists or has already been deleted.

Every single DELETE you issue counts as a write request so depending on how busy your cluster is, it may have a measurable impact on its performance. Cheers!

CodePudding user response:

Erick's answer is pretty solid, but I'd just like to add that the time that you'll likely see performance issues is at read-time. That's because doing a:

SELECT * FROM sample_table WHERE id='1';

...will read ALL of the times that the DELETE was written (tombstones) from the SSTable file. The default settings on a table result in deleted data staying around for 10 days (to ensure proper replication) before they can be picked-up by compaction.

So figure out how many times that DELETE happens per key over a 10 day period, and that's about how many Cassandra will have to reconcile at read-time.

  • Related