Using C# CLR in SELECT and WHERE at the same time-CodePudding

I have a C# CLR doing multiple string manipulations. It returns an int. I'm calling it with

SELECT func(col1, col2) AS Score
INTO #A
FROM table;

SELECT *
INTO #B
FROM #A
WHERE Score > -1

If I do something like

SELECT func(col1, col2) AS Score
INTO #A
FROM table
WHERE func(col1, col2) > -1;

Is the CLR called and executed two times? In terms of performance/speed is there a better way to get the same result?

col1 is nvarchar(max), col2 is nvarchar(800). The function contains business logic.

There are about 10 billion rows/calculations

CodePudding user response：

You should not assume that it will be run only once. It may be, but the behavior may be plan-dependent, and later devs will have the same question. Instead, as @Larnu suggests, push function into a subquery/cte.

with q as
(
  SELECT func(col1, col2) AS Score
  FROM table
),
select *
INTO #A
from q
WHERE Score > -1;

CodePudding user response：

If your function is expensive and the number of unique pairs (col1, col2) is significantly less than the number of input records, try using this:

select B.score 
from 
(
    select func(A.col1,A.col2) as score
    from 
    (
        select distinct col1, col2 from table 
    ) A
    where func(A.col1,A.col2) > -1
) B
where B.score > -1

I think whether the function is deterministic or not - can be important when creating a query execution plan by the engine.

I know from experience that (with large tables) temporary tables can be a faster solution than subqueries and CTE's, you can try rewrite proposed query using #temptables.

CodePudding user response：

I wrote simple function taking two parameters and returning an integer. The function writes to external text file that was called with specified parameters, I was able to track how many times it was called.

Conclusion is: total number of calls is equal to sum of number of input records and number of records that meet criteria specified in WHERE clause.

SELECT don't know what WHERE is doing. Deterministic / Nondeterministic function flags doesn't change anything.

Of course - it's just a simple exercise, and I'm not an outstanding/certified Microsoft systems theorist.

I'm afraid - if you are not able to rewrite your CLR - you won't get much more, though, maybe parallel queries could speed things up ?