Home > Enterprise >  How to extract alpha or numbers from a string within a SELECT statement WITHOUT a function call
How to extract alpha or numbers from a string within a SELECT statement WITHOUT a function call

Time:12-22

I am using the following SQL to create a function to strip out characters or numbers.

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%[' @MatchExpression ']%'

    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')

    RETURN @String

END

I am calling it like this to extract the alpha characters and then the numeric characters into two sort fields:

SELECT
    ...
    (SELECT dbo.fn_StripCharacters(PD.District, '^a-z')) AS Sort1,
    CAST((SELECT dbo.fn_StripCharacters(PD.District, '^0-9')) AS INT) AS Sort2,
    ...
FROM

I am searching a LOT of Postcode District Records and the constant calls to the function are causing a comparatively big delay. Is there a way of reproducing this functionality without a function call? Can it be incorporated into the SELECT statement somehow?

SQL Server Version is 2017

CodePudding user response:

You can try something like this:

 Create Function [dbo].[fnCleanString] (
        @inputString varchar(8000)
      , @stringPattern varchar(50) = '[0-9a-zA-Z]'
        )
Returns table
   With schemabinding
     As
 Return

   With t(n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (n)
     As (
 Select Top (len(@inputString)) 
        checksum(row_number() over(Order By @@spid)) 
   From t t1, t t2, t t3                                -- 8000 rows
        )
 Select v.inputString
      , outputString = (Select substring(v.inputString, it.n, 1)
                          From iTally                     it
                         Where substring(v.inputString, it.n, 1) Like @stringPattern
                           For xml Path(''), Type).value('.', 'varchar(8000)')
   From (Values (@inputString)) As v(inputString);
GO

And call it like this:

Declare @testData table (AlphaNumeric varchar(100));
 Insert Into @testData (AlphaNumeric)
 Values ('a1b2c3d4'), ('5e6f7g8i');

 Select *
   From @testData                       td
  Cross Apply dbo.fnCleanString(td.AlphaNumeric, '[^A-Z]') cs
  Cross Apply dbo.fnCleanString(td.AlphaNumeric, '[^0-9]') cs2;

This works with VARCHAR - but you can easily change it to return NVARCHAR if that is what you really need. Just be aware that it is written to work with a fixed length string and the maximum size for NVARCHAR is 4000.

You also need to be aware of collation and how that will affect the results.

CodePudding user response:

How to extract alpha or numbers from a string within a SELECT statement WITHOUT a function call?

You can't, because simple string operations like REPLACE() are also function calls. And, user-defined functions are compiled by SQL Server. They perform pretty well.

But your problem is not the function's performance itself, but rather how often you must use it. You knew that.

Here's a possible way to speed up your postcode-grinding task: put persisted computed columns on your table. You can even index them.

Here's how to do that.

  1. Tell SQL Server to use schemabinding with your stored function. It needs to know that a table definition in your schema depends on the function. To do that add WITH SCHEMABINDING to your function definition.

    ...
    RETURNS NVARCHAR(MAX)
    WITH SCHEMABINDING
    AS
    BEGIN
    ...
    
  2. Add two computed, persisted, columns to your table.

    ...
    ALTER TABLE postcode
           ADD letters   
           AS (dbo.fn_StripCharacters(postcode, '^A-Z'))
           PERSISTED;
     ALTER TABLE dbo.postcode
           ADD numbers
           AS (CAST(dbo.fn_StripCharacters(postcode, '^0-9') AS INT)) 
           PERSISTED;
    
  3. You can put indexes on the computed columns if you need them.

    CREATE INDEX numbers ON postcode  (numbers DESC)
    
  4. Now you can insert, update, or delete your non-computed columns as you wish. SQL Server evaluates your stored functions just once for each row as it is inserted or updated. Your functions still get evaluated, but not when you SELECT from your table.

  5. And you can use the computed columns as you wish

    SELECT * FROM postcode ORDER BY numbers DESC
    

Here's a db<>fiddle demonstrating this.

  • Related