Home > database >  Keep only allowed characters in a string
Keep only allowed characters in a string

Time:10-26

If I know which characters I do not want to keep in my string I can easily remove them with REPLACE:

SELECT REPLACE(
         REPLACE(
           REPLACE(
             'String with characters like #§$ I do not want to keep', 
             '#', ''
           ), '§', ''
         ), '$', ''
       ) AS repl_string

--- String with characters like  I do not want to keep

But, what if I want to remove every character which is not part of a positive list? With a regex I would so something like s/[^a-zA-Z0-9 :.]//g (assuming that I would like to allow only letters, numbers, a space , a dot . or a colon :)

I am looking for a solution for Microsoft SQL Server 2016

CodePudding user response:

This answer uses a very similar solution to the answer I linked above, however, as the OP is using SQL Server 2016, they can't use STRING_AGG. As such you'll need to use the "old" FOR XML PATH (and STUFF) method to reaggregate the string:

CREATE OR ALTER FUNCTION [dbo].[PatternCharacterReplace_XML] (@String varchar(8000), @Pattern varchar(100), @ReplacementCharacter varchar(1)) 
RETURNS table
AS RETURN
    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT TOP(LEN(@String))
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4)
    SELECT (SELECT CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END
            FROM Tally T
                 CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1)))V(C)
            ORDER BY T.I
            FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(8000)') AS ReplacedString;
GO


SELECT *
FROM dbo.PatternCharacterReplace_XML('String with characters like #§$ I do not want to keep','[^A-Za-z0-9 .:]','');

Note that this particular version uses varchar values. If your value is an nvarchar I would suggest creating a separate version that uses that data type for the parameters.

  • Related