Behavior of subquery's where against where of query-CodePudding

Could you explain me pls the behavior of where in subquery against where of the query? I have a query

SELECT
        [Object_Id]     = S.[Object_Id],
        [Schema]        = S.[Schema],
        [Name]          = S.[Name],
        [Type]          = S.[Type],
       SubString(S.[definition], S.[StartIndex],  S.[EndIndex] - S.[StartIndex] 12),
        S.[definition], S.[StartIndex], S.[EndIndex]
    FROM
    (
        SELECT
            [Object_Id]         = P.[object_id],
            [Schema]            = Schema_Name(P.[schema_id]),
            [Name]              = P.[name],
            [Type]              = P.[type],
            [Definition]        = S.[definition],
            [StartIndex]        = X.[StartIndex],
            [EndIndex]          = X.[EndIndex]
        FROM        sys.objects         P
        INNER JOIN  sys.sql_modules     S   ON S.[object_id] = P.[object_id]
        CROSS APPLY
        (
            SELECT
                [StartIndex]     =  CharIndex('<'   'Generator ', S.[definition]),
                [EndIndex]       =  CharIndex('<'  '/Generator>', S.[definition])
        ) X
        WHERE  P.[schema_id] <> Schema_Id('SQL')
            and P.[object_id] >= 69665580 --69665579
            and P.[object_id] <= 72985424 --72985424   
            and X.[StartIndex] > 1 
            AND X.[StartIndex] < X.[EndIndex]
    ) S
    WHERE 
        SubString(S.[definition], S.[StartIndex],  S.[EndIndex] - S.[StartIndex] 12) is not null

It returns error "Invalid length parameter passed to the LEFT or SUBSTRING function". The problem is recordset of the subquery without 'Where' has records which lead to this error, but with the where clause subquery returns only one correct row. If I insert record form the subquery into a table variable and then use this table in a query it works fine as expeted. But why doesn't it work with subquery? For me it looks like 'Where' of the whole query ignores filters applied to the subquery.

CodePudding user response：

The problem is that SQL Server is free to rearrange WHERE predicates, including pushing them down into derived tables (subqueries). While it will take calculations into account, it does not take into account possible runtime errors.

Therefore, it's possible that what has happened here is:

The CROSS APPLY calculated values were calulated first.
The WHERE SubString(S.[definition], S.[StartIndex], S.[EndIndex] - S.[StartIndex] 12) is not null was applied next.
Then the other WHERE P.[schema_id] <> Schema_Id('SQL') and P.[object_id] >= 69665580..... filters.

There is no way to avoid this happening directly, although using TOP and some other operators can often prevent it, but this is not guaranteed. The only way to avoid a runtime error is to null out invalid values using NULLIF (which compiles to a CASE which is the only guaranteed operator to not be rearranged). You should do this any time you use CHARINDEX and PATINDEX.

        CROSS APPLY
        (
            SELECT
                StartIndex     =  NULLIF(CharIndex('<'   'Generator ', S.definition), 0),
                EndIndex       =  NULLIF(CharIndex('<'  '/Generator>', S.definition), 0)
        ) X

There is one other improvment you should make: avoid SCHEMANAME and instead just join sys.schemas

...
[Schema]            = schm.schema_name),
...
INNER JOIN sys.schemas schm ON schm.schema_id = P.schema_id
...

Don't quote column names unless you really have to, it's really annoying to read.