Home > Enterprise >  attempting a replace function on sql but with wildcards
attempting a replace function on sql but with wildcards

Time:11-26

this is my first time posting so hopefully I get all the information needed across correctly.

I'm attempting to do an update replace statement on a row where there is a set of characters but a wildcard statement won't work, below is a select statement example

SELECT replace(column, '[aA][bB][cC]-', '')
FROM table
where column like '%[aA][bB][cC]-%'

an example row would be "abc-123 aBc-123 abC-abc Def-123", but I need it to return "123 123 abc"

this is a basic example, but I'm trying to get rid of a set of 3 characters and a "-" character anywhere in a string. the abc could change to def but the "-" character will always come after.

I've done some googling and can't find an appropriate solution as most examples will only remove one example of abc- where I need to get rid of all versions. I'm running version 12.0 of sql server (SQL Server 2014) so I think some functions I wouldn't be able to use.

I think the closest example I could find was Using Wildcard For Range of Characters In T-SQL but I can't use the translate function.

Edit: below is an example of a created table

 CREATE TABLE String_Removal_Example( 
someValue VARCHAR(100) 
) 
INSERT INTO String_Removal_Example (someValue) VALUES ('abc-123') 
INSERT INTO String_Removal_Example (someValue) VALUES ('abc-123 ABC-123') 
INSERT INTO String_Removal_Example (someValue) VALUES ('abc-123 ABc-123 123-ABC DEF-123')

select statement brings back

someValue
abc-123
abc-123 ABC-123
abc-123 ABc-123 123-ABC DEF-123

Edit2: If this isn't possible to do in this manner, a possible alternative would be to remove 3 characters and the - character. I've tried the below

select Substring (someValue, Charindex( '-', someValue )   1, Len(someValue)) From String_Removal_Example

but this returns the below, which is only affecting the first instance of nnn-.

123
123 ABC-123
123 ABc-123 123-ABC DEF-123

Edit3: The string I need to replace is nnn- for clarification. I was trying for the [aA][bB][cC]- format in case I needed to change it. It will always be 3 characters followed by a "-" Character

CodePudding user response:

You can create a function like this: (I named it ThreeLetters, but what's in a name...)

CREATE FUNCTION [dbo].[ThreeLetters] 
(
    @p0 VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @p1 VARCHAR(20) = '%[A-Z][A-Z][A-Z]-%';
    DECLARE @Result VARCHAR(MAX) = @p0;
    DECLARE @pos INT = PATINDEX(@p1,@Result COLLATE SQL_Latin1_General_CP1_CS_AS);
    DECLARE @pos2 INT ;
    DECLARE @i INT =0;

    WHILE @pos>0 and @i<10
    BEGIN
       SET @pos2 = CHARINDEX('-',@Result,@pos)-@pos;
       SELECT @Result = SUBSTRING(@Result,1,@pos-1) SUBSTRING(@Result, @pos @pos2 1, len(@Result));
       SET @pos = PATINDEX(@p1,@Result COLLATE SQL_Latin1_General_CP1_CS_AS);
       SET @i = @i   1;
    END 
    RETURN @Result
END

output with your sample data:

someValue (No column name)
abc-123 123
abc-123 ABC-123 123 123
abc-123 ABc-123 123-ABC DEF-123 123 123 123-ABC 123

see DBFIDDLE

P.S. The check for @i was introduced while debugging. It might not be needed, but was tooo lazy to remove it.

P.P.S I am removing ThreeLetters followed by a -, that is why 123-ABC remains unchanged.

CodePudding user response:

Here's an XML trick that'll also work in Sql Server 2014

(In Sql Server 2016 and beyond, the string_split function would be better.)

declare @String_Removal_Example table ( 
 someValue varchar(100) not null
);

insert into @String_Removal_Example (someValue) VALUES 
  ('abc-123') 
, ('abc-123 ABC-123') 
, ('abc-123 ABC-123 123-Abc DEF-123');

select ca.value as someValue
from @String_Removal_Example t
cross apply 
(
  select rtrim((
    select node.n.value('.','varchar(30)') ' '
    from 
    (
      select cast('<x><a>'   replace(replace(t.someValue, 
                  ' ', '</n><a>'), 
                  '-', '</a><n>')  
                  '</n></x>' as xml) as x
    ) q
    cross apply x.nodes ('/x/n') AS node(n)
    for xml path('') 
   )) as value
) ca
where lower(t.someValue) like '%[a-z0-9]-[a-z0-9]%';
GO
| someValue       |
| :-------------- |
| 123             |
| 123 123         |
| 123 123 Abc 123 |

db<>fiddle here

CodePudding user response:

Please try the following solution. It is a variation of @LukStorms solution.

It is based on XML and XQuery. Their data model is based on ordered sequences.

XML allows us to tokenize the input string.

It is looking for 'nnn-' strings and filters them out.

For example, 2nd row is converted to XML like this:

<root>
  <r>abc-</r>
  <r>123</r>
  <r>ABC-</r>
  <r>123</r>
</root>

SQL

-- DDL and sample data population, start
declare @String_Removal_Example table ( 
 someValue varchar(100) not null
);
insert into @String_Removal_Example (someValue) VALUES 
  ('abc-123') 
, ('abc-123 ABC-123') 
, ('abc-123 ABC-123 123-Abc DEF-123');
-- DDL and sample data population, end

SELECT someValue AS [before], [after]
FROM @String_Removal_Example
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA['   
         REPLACE(REPLACE(someValue, '-', '- '), SPACE(1), ']]></r><r><![CDATA[')   
         ']]></r></root>' AS XML)
    .query('data(/root/r[not(contains(text()[1],"-"))])')
    .value('.','VARCHAR(255)')) AS t1(after);

Output

 --------------------------------- ----------------- 
|             before              |      after      |
 --------------------------------- ----------------- 
| abc-123                         | 123             |
| abc-123 ABC-123                 | 123 123         |
| abc-123 ABC-123 123-Abc DEF-123 | 123 123 Abc 123 |
 --------------------------------- ----------------- 

CodePudding user response:

SELECT replaceAll(column, '[aA][bB][cC]-', '') FROM table where column like '%[aA][bB][cC]-%'

CodePudding user response:

This is basically dividing up the string where there are spaces, then looking for "words" that have a '-' in the 4th position, removing the prefix and then stitching it back together:

SELECT s.someValue,
    STUFF((SELECT ' '   SUBSTRING(value, 5, LEN(value) - 4)
    FROM STRING_SPLIT(someValue, ' ') 
             WHERE CHARINDEX('-', value) = 4 
             FOR XML PATH('')),1,1,'') a
FROM String_Removal_Example s

CodePudding user response:

As there is no regex replace in SQL Server, I think the only way you can reliably use a pattern replace is through an iterative process. One such way is with a recursive CTE:

WITH Data AS
(   SELECT  SomeValue
    FROM (VALUES 
            ('abc-123'), 
            ('abc-123 ABC-123'), 
            ('abc-123 aBc-123 abC-abc Def-123'),
            ('abc-123 ABCD-123'), 
            ('abc-123 A$C-123')
        ) x (SomeValue)
), RegexReplace AS
(   SELECT  SomeValue, NewValue = CONVERT(VARCHAR(100), Data.SomeValue), Level = 1
    FROM    Data
    UNION ALL
    SELECT  SomeValue, CONVERT(VARCHAR(100), STUFF(NewValue, PATINDEX('% [A-z][A-z][A-z]-%', CONCAT(' ', NewValue)), 4, '')), Level   1
    FROM    RegexReplace AS d
    WHERE   PATINDEX('% [A-z][A-z][A-z]-%', CONCAT(' ', NewValue)) > 0
), RankedData AS
(   SELECT  *, RowNumber = ROW_NUMBER() OVER(PARTITION BY rr.SomeValue ORDER BY rr.Level DESC)
    FROM    RegexReplace AS rr
)
SELECT  rd.SomeValue, rd.NewValue
FROM    RankedData AS rd
WHERE   rd.RowNumber = 1;

Which gives:

SomeValue New Value
abc-123 123
abc-123 ABC-123 123 123
abc-123 aBc-123 abC-abc Def-123 123 123 abc 123
abc-123 A$C-123 123 A$C-123
abc-123 ABCD-123 123 ABCD-123

This basically uses PATINDEX() to find the first instance of the pattern, then STUFF() to remove the 4 characters after the index found. This process then repeats until the PATINDEX() fails to find a match.

Probably also of note is that when doing the PATINDEX() I am adding a space to the start of the string, and including a space at the start of the pattern. This ensures that it is limited to only 3 characters followed by a hyphen, if it is more than 3 (e.g. ABCD-123) the entire string is left intact rather than leaving A123 as shown in the last row above. Adding the space to the start of the string being searched ensures that the pattern is still found even if it is at the start of the string.

  • Related