I have a json that looks like this. No nesting.
[{"id": [1984262,1984260]}]
I want to create a table in Athena using sql such that I have a column "id" and each row in that column would contain a value from the array. Something like this
id
1984262
1984260
What I tried
CREATE EXTERNAL TABLE table1 (
id string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
and
CREATE EXTERNAL TABLE table2 (
id array<bigint>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
When I preview the table I see empty rows with absolutely no data. Please help.
CodePudding user response:
long story short: your JSON file needs to be compliant with the JSON-SerDe.
To query json data with athena you need to define a JSON (de-)serializer. You chose Hive JSON SerDe. https://docs.aws.amazon.com/athena/latest/ug/json-serde.html
Now you data needs to be compliant with that serializer. For Hive JSON SerDe that means that each line needs to be a single line json that corresponds to one record. For you that would mean:
{ "id" : 1984262 }
{ "id" : 1984260 }
and the corresponding table definition would be
CREATE EXTERNAL TABLE table1 (
id bigint
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://data-bucket/data.json';
https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/README.md