I have a parquet file whic is stored in a partitioned directory. The format of the partition is
/dates=*/hour=*/something.parquet
.
The content of parquet file looks like as follows:
{a:1,b:2,c:3}.
This is json data and i want to create external hive table.
My approach:
CREATE EXTERNAL TABLE test_table (a int, b int, c int) PARTITIONED BY (dates string, hour string) STORED AS PARQUET LOCATION '/user/output/';
After that i run MSCK REPAIR TABLE test_table;
but i get following ouput:
hive> select * from test_table;
OK
NULL NULL NULL 2021-09-27 09
The other three columns are null. I think i have to define JSON schema somehow but i have no idea how to proceed further. Any help would be great.
CodePudding user response:
Create table with the same schema as parquet file:
CREATE EXTERNAL TABLE test_table (value string) PARTITIONED BY (dates string, hour string) STORED AS PARQUET LOCATION '/user/output/';
Run repair table to mount partitions:
MSCK REPAIR TABLE test_table;
Parse value in query:
select e.a, e.b, e.c
from test_table t
lateral view json_tuple(t.value, 'a', 'b', 'c') e as a,b,c
Cast values as int if necessary: cast(e.a as int) as a