I have a table in Hive which I would like to drop its partition keys for later using other partition keys.
The location of parquet file is in Amazon S3. The table, I am working on, is partitioned by date_year and date_month columns. There are 143 partitions altogether. Now I am trying to drop the partition by executing the following command:
Alter Table `my_hive_db`.`my_table`
Drop PARTITION (`date_year` , `date_month` );
I however, get this error:
Error while compiling statement: FAILED: ParseException line 48:28 mismatched input ',' expecting set null in drop partition statement.
If it helps, my table definition is as follows:
CREATE External Table `my_hive_db`.`my_table`(
`col_id` bigint,
`result_section__col2` string,
`result_section_col3` string ,
`result_section_col4` string,
`result_section_col5` string,
`result_section_col6__label` string,
`result_section_col7__label_id` bigint ,
`result_section_text` string ,
`result_section_unit` string,
`result_section_col` string ,
`result_section_title` string,
`result_section_title_id` bigint,
`col13` string,
`timestamp` bigint,
`date_day` string
)
PARTITIONED BY (
`date_year` string,
`date_month` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3a://some/where/in/amazon/s3';
The most important thing is, I really don't want to delete underlying files. I only want to drop the partition key so that later I can repartition the table with different combination of columns. The question is how can I alter the table, droping the partitions, but still keeping those partition keys in the table as normal columns.
I am open to achieve this by either Hive or Spark. However, Hive is more preferred at this stage.
Thank you for your valuable input.
CodePudding user response:
I do not think you can re-partition hive table based on different column(s). Because partitions are mapped to physical folders in HDFS and it can not be redistributed as needed.
So, only option is to -
- backup the table into a bkp table.
- drop original table and recreate the table with new partition.
- insert into new original table from backup.
Alternately you can create a new table with new partitions and insert from old table and then drop old and rename new table.