Home > OS >  Giant unpartitioned MySQL table issues
Giant unpartitioned MySQL table issues

Time:06-04

I have a MySQL table which is about 8TB in size. As you can imagine, querying is horrendous.

I am thinking about:

  1. Create a new table with partitions
  2. Loop through a series of queries to dump data into those partitions

But the loop will require lots of queries to be submitted & each will be REALLY slow.

Is there a better way to do this? Repartitioning a production database in-situ isn't going to work - this seemed like an OK option, but slow

And is there a tool that will make life easier? Rather than a Python job looping & submitting jobs?

Thanks a lot in advance

CodePudding user response:

You could use pt-online-schema-change. This free tool allows you to partition the table with an ALTER TABLE statement, but it does not block clients from using the table while it's restructuring it.

Another useful tool could be pt-archiver. You would create a new table with your partitioning idea, then pt-archiver to gradually copy or move data from the old table to the new table.

Of course try out using these tools in a test environment on a much smaller table first, so you get some practice using them. Do not try to use them for the first time on your 8TB table.

Regardless of what solution you use, you are going to need enough storage space to store the entire dataset twice, plus binary logs. The old table will not shrink, even as you remove data from it. So I hope your filesystem is at least 24TB. Or else the new table should be stored on a different server (or ideally several other servers).

It will also take a long time no matter which solution you use. I expect at least 4 weeks, and perhaps longer if you don't have a very powerful server with direct-attached NVMe storage.

If you use remote storage (like Amazon EBS) it may not finish before you retire from your career!

In my opinion, 8TB for a single table is a problem even if you try partitioning. Partitioning doesn't magically fix performance, and could make some queries worse. Do you have experience with querying partitioned tables? And you understand how partition pruning works, and when it doesn't work?

Before you choose partitioning as your solution, I suggest you read the whole chapter on partitioning in the MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html, especially the page on limitations: https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html Then try it out with a smaller table.

A better strategy than partitioning for data at this scale is to split the data into shards, and store each shard on one of multiple database servers. You need a strategy for adding more shards as I assume the data will continue to grow.

  • Related