Home > Net >  Find and eliminate duplicates in strings
Find and eliminate duplicates in strings

Time:07-25

I have a SQL Server 2012 table with a column of type [varchar](max) NULL.

It basically contains a list of email addresses separated by a semicolon

[email protected]; [email protected]; [email protected]

Without going into a discussion about how it would be better to resolve the source data that came to create this situation (which is perfectly reasonable advice), is there any clever way to eliminate said duplicates?

[email protected]; [email protected]

The only rules I have is that

  • There will be a semicolon between addresses in the string
  • But not at the start, and not at the end
  • The column may also be NULL
  • Sort order is irrelevant

Slicing and dicing substrings is fine, but I don't see how to do substitution in SQL

CodePudding user response:

Since you are 2012, here is an XML approach

Declare @YourTable table (ID int,SomeCol varchar(500))
Insert Into @YourTable values
 (1,'[email protected]; [email protected]; [email protected]')
,(2,'[email protected]')

Select A.ID
      ,B.NewValue
 From @YourTable A
 Cross Apply (
               Select NewValue = stuff( ( Select Distinct '; '  ltrim(rtrim(i.value('(./text())[1]', 'varchar(max)')))
                                            From  ( values (cast('<x>'   replace((Select replace(SomeCol,';','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>') '</x>' as xml).query('.'))) A(x)
                                            Cross Apply x.nodes('x') X(i)
                                            For XML Path ('')),1, 2,'') 
             ) B

Results

ID  NewValue
1   [email protected]; [email protected]
2   [email protected]

CodePudding user response:

As you have hinted in your question, it would be much better to not store CSV data like this in your table. Assuming you are using SQL Server 2017 or later, you may use a combination of STRING_SPLIT and STRING_AGG here:

WITH cte AS (
    SELECT DISTINCT LTRIM(value) AS value
    FROM yourTable
    CROSS APPLY STRING_SPLIT(input, ';')
)

SELECT STRING_AGG(value, '; ') AS input
FROM cte;

Demo

  • Related