Home > Software engineering >  How to define AWS GLUE schema for JSON sent from python SDK to firehose?
How to define AWS GLUE schema for JSON sent from python SDK to firehose?

Time:05-30

I have this setup in mind:

PythonSDK sending predefined JSON -> aws kinesis firehose -> convert data to "Parquet" using AWS GLUE schema -> save data to S3 (either if succeed or not).

While sending primities type like strings, ints & booleans is easy, sending array/struct isn't trivial at all. I keep getting weird error messages of:

The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'STRUCTname:STRING,id:BIGINT,is_bla:BOOLEAN' but 'STRUCT' is found.

OR

The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'ARRAY' but 'ARRAY' is found.

  1. Why I'm getting those error messages?
  2. Is there a proper doc/examples for schema data types? i could only find this saying Column Type should match the "Single-line string pattern".

CodePudding user response:

I'll answer my question:

there is some delay between saving GLUE schema & sending data to firehose. updated JSONs I send used old schema hence the errors.

also from this and that we have to validate some naming conventions ourselfs, it's quite unfortunate AWS doesn't do it upon creation.

  • Related