Home > Blockchain >  How to partition huge (15 TB) existing table in SQL server without creating clustered index
How to partition huge (15 TB) existing table in SQL server without creating clustered index

Time:10-21

I am trying to partition a huge table in SQL. Its size is 15 TB with millions of records. The main idea is to save the coming data onto different disks by creating new filegroup where the current disk can't be expanded any more.

I created the needed of filegroups, files, partition schema and partition function, but when I created the clustered index it took too much space (more than 200GB) and was still running so I stopped it.

My question: is there a way to partition an existing huge table without creating a clustered index that is taking too much space? Or is there a way to save the new coming data to a different disk?

CodePudding user response:

To avoid the pain of creating a 15TB index (which you could create on a new filegroup on the new disks, of course), you can instead create a new partitioned table from scratch, start writing all new incoming data there, and slowly move the data over.

Borrowing from my own answer here:

  • create a new partitioned table on the new disk with enough partitions to happily divide up all the existing and some-reasonable-time-into-the-future future data
  • create a view that union alls the two tables (may be simplest to rename the current table and replace it, temporarily, with a view having the old table name)
  • direct new writes to the new, partitioned table
    • hopefully you control writes via a stored procedure or minimal places where you change the target name
    • if not, you can use an instead of trigger on the view to direct writes to the partitioned table
  • in the background, start delete top (@batchsize) output deleted.* into new_table from old_table
    • this doesn't matter how long it takes to get everything done, the trick is to optimize the batch size so it's a balance of getting things done and not causing blocking for too long, and to make sure you put in some log backups between every n batches if they're not already scheduled frequently enough (some more info on that here)
  • once all the data is backfilled, you can drop the old table, then change the view to no longer union all (or get rid of it and rename the new table)

If it takes two weeks to backfill all the old data into the new partition scheme, so what? Users aren't waiting for two weeks; they're only ever waiting on any individual batch (and even then, that'll largely just be the folks querying the data that's moving over, not the new data).

CodePudding user response:

The main idea is to save the coming data onto different disks by creating new filegroup where the current disk can't be expanded any more.

Or is there a way to save the new coming data to a different disk?

Yes. Partitioning is gross overkill for this. Just add a new file to the existing filegroup and place the new file on a new disk. Almost all new allocations will come from the new file on the new disk, due to SQL Server's Proportional Fill Algorithm.

  • Related