Searching for multiple patterns in a string in T-SQL-CodePudding

In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.

Examples of these values are:

USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .

Example of string before:

"Offered to XVIV and USFRI as back ups.  No response as of yet."

Example of string after:

"Offered to and as back ups.  No response as of yet."

Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.

CodePudding user response：

This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:

with t as (
    select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s 
    union select 'Another row AZCH and TXJS words.'
), v as (
    select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)

select t.s OriginalString, s.Removed
from t
cross apply (
    select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
    from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
    where not exists (select * from v where v.v = j.[value])
)s;

_{* Requires a fully-supported version of SQL Server.}

CodePudding user response：

I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:

DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);

If you just want to view your data sans these values, then use:

SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);

CodePudding user response：

build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. And your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION fn_ReplaceValue(@sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
    DECLARE @ResultVar VARCHAR(500)
    DECLARE @allValues TABLE (rowID int, sValues VARCHAR(15))
    DECLARE @id INT = 0

    DECLARE @ReplaceVal VARCHAR(10)
    DECLARE @numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)

    --Populate table variable with all values
    INSERT @allValues 
    SELECT ROW_NUMBER()  OVER(ORDER BY MyValuesCol) AS rowID, NationalIDNumber
    FROM MyValuesTable
    
    SET @ResultVar = @sentence

    WHILE (@id <= @numberOfValues)
    BEGIN
        SET @id = @id   1
        SET @ReplaceVal = (SELECT sValue FROM @allValues WHERE rowID = @id)

        SET @ResultVar  =  REPLACE(@ResultVar, @ReplaceVal, SPACE(0))
    END

    RETURN @ResultVar
END
GO