I am loading my data to athena from json format using org.apache.hive.hcatalog.data.JsonSerDe
. Currently, I specify objects and lists with struct<...>
or array<...>
, but the goal is to have them in a final parquet table as varchar
or string
type. So that I could have it still in a JSON format but with a string/varchar type.
Could you advise some good way to do it?
Casting to json and afterwards to varchar did not work for me:
CREATE TABLE new_table AS SELECT cast(my_struct_col as varchar) my_struct_col FROM (
SELECT cast(my_struct_col as JSON) my_struct_col FROM raw_table
)
CodePudding user response:
Use json_format
to turn your json into varchar:
json_format(json)
→varchar
Returns the JSON text serialized from the input JSON value. This is inverse function tojson_parse()
:
SELECT json_format(JSON '[1, 2, 3]'); -- '[1,2,3]'
SELECT json_format(JSON '"a"'); -- '"a"'
Note that json representation for ROW
in Presto is just an array -Presto does not preserve the column names info:
SELECT CAST(CAST(ROW(123, 'abc', true)
AS ROW(v1 BIGINT, v2 VARCHAR, v3 BOOLEAN)) AS JSON);
-- JSON '[123,"abc",true]'
This was changed in Trino (Athena version 3 engine should be based on it):
SELECT CAST(CAST(ROW(123, 'abc', true) AS
ROW(v1 BIGINT, v2 VARCHAR, v3 BOOLEAN)) AS JSON);
-- JSON '{"v1":123,"v2":"abc","v3":true}'