Home > Blockchain >  HIVE - Cannot partition a table: semantic exception failure
HIVE - Cannot partition a table: semantic exception failure

Time:09-30

I'm not able to import data on partitioned table in Hive.

Here is how I create the table

CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");

And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;

It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

I now create my partitioned table:

CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;

insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;

(I also tried with numVotes instead by the way)

And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns

Someone can help me please? Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)

CodePudding user response:

You can run this command to check if there are null values or not.

select count(averageRating) from title_ratings group by averageRating;

Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.

CodePudding user response:

Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement. Pls change order of columns in select.

insert into title_ratings_part partition(title_ratings)
Select 
Tconst, 
numVotes, 
averageRating --orderwise this should always be last column
from title_ratings
  • Related