In hive,I have Orc file formatted table with 10 buckets and the table has 1Tb of data already.If i increase the bucket count,will my existing data split occurs between new buckets automatically or do I need to reload the data in table . Is there any way to alter the bucket count? I am newbie to bucketing concepts.can someone help on answer this question?
CodePudding user response:
If you use ALTER TABLE mytable CLUSTERED BY (my_field) INTO 10 BUCKETS
, existing data will not be redistributed. And any new row will be newly bucketed.
If you want a clean method, please follow -
- Create a new table with new structure.
- Insert data into the new table from old table.
- Drop old table.
This will redistribute whole data into new buckets.