Home > Software engineering >  Hive, how to drop partition, Error while compiling statement: FAILED: expecting set null in drop par
Hive, how to drop partition, Error while compiling statement: FAILED: expecting set null in drop par

Time:11-26

I have a table in Hive which I would like to drop its partition keys for later using other partition keys.

The location of parquet file is in Amazon S3. The table, I am working on, is partitioned by date_year and date_month columns. There are 143 partitions altogether. Now I am trying to drop the partition by executing the following command:

Alter Table `my_hive_db`.`my_table`
Drop PARTITION (`date_year` , `date_month` );

I however, get this error:

Error while compiling statement: FAILED: ParseException line 48:28 mismatched input ',' expecting set null in drop partition statement.

enter image description here

If it helps, my table definition is as follows:

CREATE External Table `my_hive_db`.`my_table`(
    `col_id` bigint,
    `result_section__col2` string,
    `result_section_col3` string ,
    `result_section_col4` string,
    `result_section_col5` string,
    `result_section_col6__label` string,
    `result_section_col7__label_id` bigint ,
    `result_section_text` string ,
    `result_section_unit` string,
    `result_section_col` string ,
    `result_section_title` string,
    `result_section_title_id` bigint,
    `col13` string,
    `timestamp` bigint,
    `date_day` string
    )
    PARTITIONED BY ( 
      `date_year` string, 
      `date_month` string)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION
      's3a://some/where/in/amazon/s3';

The most important thing is, I really don't want to delete underlying files. I only want to drop the partition key so that later I can repartition the table with different combination of columns. The question is how can I alter the table, droping the partitions, but still keeping those partition keys in the table as normal columns.

I am open to achieve this by either Hive or Spark. However, Hive is more preferred at this stage.

Thank you for your valuable input.

CodePudding user response:

I do not think you can re-partition hive table based on different column(s). Because partitions are mapped to physical folders in HDFS and it can not be redistributed as needed.

So, only option is to -

  1. backup the table into a bkp table.
  2. drop original table and recreate the table with new partition.
  3. insert into new original table from backup.

Alternately you can create a new table with new partitions and insert from old table and then drop old and rename new table.

  • Related