Home > Back-end >  Creating external hive table from parquet file which contains json string
Creating external hive table from parquet file which contains json string

Time:10-28

I have a parquet file whic is stored in a partitioned directory. The format of the partition is

/dates=*/hour=*/something.parquet.

The content of parquet file looks like as follows:

{a:1,b:2,c:3}.

This is json data and i want to create external hive table.

My approach:

CREATE EXTERNAL TABLE test_table (a int, b int, c int) PARTITIONED BY (dates string, hour string) STORED AS PARQUET LOCATION '/user/output/';

After that i run MSCK REPAIR TABLE test_table; but i get following ouput:

hive> select * from test_table;
OK
NULL    NULL    NULL    2021-09-27      09

The other three columns are null. I think i have to define JSON schema somehow but i have no idea how to proceed further. Any help would be great.

CodePudding user response:

Create table with the same schema as parquet file:

CREATE EXTERNAL TABLE test_table (value string) PARTITIONED BY (dates string, hour string) STORED AS PARQUET LOCATION '/user/output/';

Run repair table to mount partitions:

MSCK REPAIR TABLE test_table;

Parse value in query:

select e.a, e.b, e.c
  from test_table t 
       lateral view json_tuple(t.value, 'a', 'b', 'c') e as a,b,c

Cast values as int if necessary: cast(e.a as int) as a

  • Related