Hive Increase in bucket count- how the existing data split occurs to new buckets?-CodePudding

In hive,I have Orc file formatted table with 10 buckets and the table has 1Tb of data already.If i increase the bucket count,will my existing data split occurs between new buckets automatically or do I need to reload the data in table . Is there any way to alter the bucket count? I am newbie to bucketing concepts.can someone help on answer this question?

CodePudding user response：

If you use ALTER TABLE mytable CLUSTERED BY (my_field) INTO 10 BUCKETS, existing data will not be redistributed. And any new row will be newly bucketed.

If you want a clean method, please follow -

Create a new table with new structure.
Insert data into the new table from old table.
Drop old table.

This will redistribute whole data into new buckets.