Home > Back-end >  Hive Increase in bucket count- how the existing data split occurs to new buckets?
Hive Increase in bucket count- how the existing data split occurs to new buckets?

Time:07-24

In hive,I have Orc file formatted table with 10 buckets and the table has 1Tb of data already.If i increase the bucket count,will my existing data split occurs between new buckets automatically or do I need to reload the data in table . Is there any way to alter the bucket count? I am newbie to bucketing concepts.can someone help on answer this question?

CodePudding user response:

If you use ALTER TABLE mytable CLUSTERED BY (my_field) INTO 10 BUCKETS, existing data will not be redistributed. And any new row will be newly bucketed.

If you want a clean method, please follow -

  1. Create a new table with new structure.
  2. Insert data into the new table from old table.
  3. Drop old table.

This will redistribute whole data into new buckets.

  • Related