Home > Enterprise >  Snowflake error when trying to use parse_json with valid JSON - Error parsing JSON: missing comma, l
Snowflake error when trying to use parse_json with valid JSON - Error parsing JSON: missing comma, l

Time:02-24

This is valid JSON (I've run it against two JSON validators and also parsed it using powershell):

{
    "actionCD": "error",
    "NotesTXT": "\"Exception call timeout\""
}

This is not valid JSON:

{
    "actionCD": "error",
    "NotesTXT": "\\"Exception call timeout\\""
}

However, the parse_json function yields a failure with the first example:

SELECT '{ "actionCD": "error", "NotesTXT": "\"Exception call timeout\"" }' as json_str
,PARSE_JSON(json_str) as json;

Error parsing JSON: missing comma, pos 38

And unexpectedly, the snowflake parse_json function works with the invalid json:

SELECT '{ "actionCD": "error", "NotesTXT": "\\"Exception call timeout\\"" }' as json_str
,PARSE_JSON(json_str) as json;

<No Errors>

This is leaving me throughly flummoxxed and uncertain on how to proceed. I'm using powershell programmatically to create valid JSON and then trying to insert valid JSON into snowflake using INSERT INTO ()...SELECT ...

Here is the insert statement I'm trying to build in powershell:

INSERT INTO DBNAME.SCHEMANAME.TABLENAME(
        RunID
       ,jsonLogTXT
        
     ) SELECT
       '$RunID'
       ,parse_json('$($mylogdata | ConvertTo-Json)')
     ;


# where $($mylogdata | ConvertTo-Json) outputs valid json, and from time-to-time includes \" to escape the double quotes. 
# But snowflake fails because snowflake wants \\" to escape the double quotes.

Is this expected? (obviously I find it unexpected :-) ). What is the recommendation here? (Should I search my json-stored-as-a-string in powershell for " and replace it with \" before sending it on to snowflake? That feels really hacky, though?)

CodePudding user response:

This is MOSTLY expected. Snowflake strings use backslash as an escape character BEFORE the JSON parsing happens.

As such: "\\"content\\"" would get parsed by snowflake as "\"content\"" which is what would get fed into the JSON parser, and be treated as valid JSON.

Similar issues can come up with single quotes.

Replacing \ with \\ before sending it to snowflake might work, although when I've run into these types of issues I find it's often accompanied by other encryption/parsing errors. I find it's usually more appropriate to change the approach and have snowflake parse a file that has JSON, for example. Then you don't have the extra round of escaping characters going on. That's a bigger change to your process though.

Snowflake's documentation has a quick note on this topic here: https://docs.snowflake.com/en/sql-reference/functions-regexp.html#escape-characters-and-caveats

CodePudding user response:

The code you posted shows the answer:

SELECT '{ "actionCD": "error", "NotesTXT": "\\"Exception call timeout\\"" }' as json_str
,PARSE_JSON(json_str) as json;
JSON_STR JSON
{ "actionCD": "error", "NotesTXT": ""Exception call timeout"" } { "NotesTXT": ""Exception call timeout"", "actionCD": "error" }

What you see is not what "you entered" thus what PARSE_JSON is parsing is what you note is "valid JSON"

The answer is very common to many computer environment, and that is the environment is reading you input, and it acts on some of it, thus the here the SQL parser is reading your SQL and it see the single \ in the valid json and thinks you are starting an escape sequence and then complains about comma's being in the wrong place.

BASH (or PowerShell), Python, even Java requires you to understand the difference between the content of a string (aka you valid JSON) and how you have to represent it so it get past the language parser.

So how should to "INSERT JSON in snowflake" one general answer is not via INSERT commands if it's high volumne. Or if you don't want to make the string parser safe you can BASE64 encode the data (in powershell) and insert base64_decode(awesomestring)

which looks like eyAiYWN0aW9uQ0QiOiAiZXJyb3IiLCAiTm90ZXNUWFQiOiAiXCJFeGNlcHRpb24gY2FsbCB0aW1lb3V0XCIiIH0= thus

SELECT PARSE_JSON(base64_decode_string('eyAiYWN0aW9uQ0QiOiAiZXJyb3IiLCAiTm90ZXNUWFQiOiAiXCJFeGNlcHRpb24gY2FsbCB0aW1lb3V0XCIiIH0=')) as json_from_B64;

gives:

JSON_FROM_B64
{ "NotesTXT": ""Exception call timeout"", "actionCD": "error" }
  • Related