Home > other >  Json to Athena table gives 0 results
Json to Athena table gives 0 results

Time:01-15

I have a json that looks like this. No nesting.

[{"id": [1984262,1984260]}]

I want to create a table in Athena using sql such that I have a column "id" and each row in that column would contain a value from the array. Something like this

id

1984262
1984260

What I tried

CREATE EXTERNAL TABLE table1 (
  id string
 )
 ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
 LOCATION 's3://data-bucket/data.json';

and

CREATE EXTERNAL TABLE table2 (
  id array<string>
 )
 ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
 LOCATION 's3://data-bucket/data.json';

and

CREATE EXTERNAL TABLE table2 (
  id array<bigint>
 )
 ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
 LOCATION 's3://data-bucket/data.json';

When I preview the table I see empty rows with absolutely no data. Please help.

CodePudding user response:

long story short: your JSON file needs to be compliant with the JSON-SerDe.

To query json data with athena you need to define a JSON (de-)serializer. You chose Hive JSON SerDe. https://docs.aws.amazon.com/athena/latest/ug/json-serde.html

Now you data needs to be compliant with that serializer. For Hive JSON SerDe that means that each line needs to be a single line json that corresponds to one record. For you that would mean:

{ "id" : 1984262 }
{ "id" : 1984260 }

and the corresponding table definition would be

 CREATE EXTERNAL TABLE table1 (
  id bigint
 )
 ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
 LOCATION 's3://data-bucket/data.json';

https://github.com/rcongiu/Hive-JSON-Serde/blob/develop/README.md

  • Related