Home > database >  In T-SQL, why is the ISJSON function not filtering out bad data when used in a CTE?
In T-SQL, why is the ISJSON function not filtering out bad data when used in a CTE?

Time:01-26

I have a table with a varchar(max) column that stores JSON data and one of the records has an extra comma at the end of an array in the JSON. The following statement results in an error

Unexpected character ',' is found...

Code:

SELECT JSON_QUERY(JsonField, '$.SomeProperty') AS JsonData
FROM MyTable

But the following code works because of the WHERE clause:

SELECT JSON_QUERY(JsonField, '$.SomeProperty') AS JsonData
FROM MyTable 
WHERE ISJSON(JsonField) = 1

I can use that statement as a CTE as follows:

WITH cte AS 
(
    SELECT JSON_QUERY(JsonField, '$.SomeProperty') AS JsonData
    FROM MyTable 
    WHERE ISJSON(JsonField) = 1
)
SELECT JsonData 
FROM cte

But when I try to filter on the cte, I get the same error:

WITH cte AS 
(
    SELECT JSON_QUERY(JsonField, '$.SomeProperty') AS JsonData 
    FROM MyTable 
    WHERE ISJSON(JsonField) = 1
)
SELECT JsonData 
FROM cte 
WHERE LEN(JsonData) > 500

Obviously, I can move the 2nd WHERE clause up to the definition of the CTE, but there are other filters I want to use and this is the the most straight forward filter I can use to provide this example.

Am I missing something or is this a bug? My workaround is to use a temp table instead.

CodePudding user response:

You can use the following expression so that the JSON_QUERY is only evaluated if the ISJSON condition is met.

SELECT CASE WHEN ISJSON(JsonField) = 1 THEN JSON_QUERY(JsonField, '$.SomeProperty') END AS JsonData
FROM MyTable 
WHERE ISJSON(JsonField) = 1

This is mostly reliable. The documentation says

You should only depend on order of evaluation of the WHEN conditions for scalar expressions (including non-correlated subqueries that return scalars), not for aggregate expressions.

And this is a scalar expression - not an aggregate. (One other case to be aware of is if you test this with a single literal value you can also encounter the error at plan compilation time as SQL Server tries to constant fold it but this shouldn't affect you as you are selecting from a table)

But really you need to be using SQL Server 2022 so you can test ISJSON(JsonField, OBJECT).

Otherwise the string '1' will pass the ISJSON check (as ISJSON(JsonField, VALUE) is 1) but still barf at the JSON_QUERY.

I suppose on earlier versions you could also add an additional condition (to both CASE and WHERE) that JsonField is LIKE '{%'

CodePudding user response:

This is a common issue. You'll see the same kind of thing with people trying to use IsNumeric() or IsDate() to pre-filter a field for number or date values, and the answer is the same: a database server may decide it's more efficient to run the function call BEFORE the where clause, and there's really nothing you can do about it except design the schema and validation to not have mixed or broken data in the first place.

So you can use a function to select only the records where IsJSON() is 1, but if you actually try to parse into the JSON data all bets are off. It's more useful to select records where IsJSON() is 0, without parsing the JSON on the DB server, so you can fix them in client code.

There's also Try_Cast() now, but I don't recall whether it supports JSON.

  • Related