I have this setup in mind:
PythonSDK sending predefined JSON -> aws kinesis firehose -> convert data to "Parquet" using AWS GLUE schema -> save data to S3 (either if succeed or not).
While sending primities type like strings, ints & booleans is easy, sending array/struct isn't trivial at all. I keep getting weird error messages of:
The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'STRUCTname:STRING,id:BIGINT,is_bla:BOOLEAN' but 'STRUCT' is found.
OR
The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'ARRAY' but 'ARRAY' is found.
- Why I'm getting those error messages?
- Is there a proper doc/examples for schema data types?
i could only find this saying Column
Type
should match the "Single-line string pattern".
CodePudding user response:
I'll answer my question:
there is some delay between saving GLUE schema & sending data to firehose. updated JSONs I send used old schema hence the errors.
also from this and that we have to validate some naming conventions ourselfs, it's quite unfortunate AWS doesn't do it upon creation.