Home > database >  How to STRING_SPLIT where there are too many commagules?
How to STRING_SPLIT where there are too many commagules?

Time:12-03

I have a question about the STRING_SPLIT. I need to separate phrases inside a String, separated by comma. The problem is that some of these phrases have around them, more commas.

This is an example:

  1. Archiviazione, 2. Conservazione in archivi**,** ad accesso selezionato, 3. Conservazione in contenitori muniti di serratura, 4. Controllo degli accessi fisici, 5. Controllo degli accessi logici, 6. Custodia atti e documenti, 7. Formazione degli incaricati, 8. Sicurezza dei siti web

As you can see, within item 2 there is a comma that hinders the process of division by comma. How can I overcome this situation?

Another question would be: is there a way to pass a parameter to the String_Split where the number with dot next can serve as a separator instead of the comma?

Thank you very much from now on!

With the query:

select * 
from string_split('1. Archiviazione, 2. Conservazione in archivi, ad accesso selezionato, 3. Conservazione in contenitori muniti di serratura, 4. Controllo degli accessi fisici, 5. Controllo degli accessi logici, 6. Custodia atti e documenti, 7. Formazione degli incaricati, 8. Sicurezza dei siti web', ',')

I got this result:

enter image description here

CodePudding user response:

A little ugly, and certainly some risk involved, but using JSON may be an alternative

Select value
 From  ( 
        Select [key]
              ,Value   = case when trim(Value) like '[0-9]%' then trim(Value) else null end
                          case when lead(trim(Value),1,'') over (order by [key]) like '[0-9]%' 
                              then ''
                              else ', ' lead(trim(Value),1,'') over (order by [key]) 
                          end
        From  OpenJSON( '["' replace(string_escape('1. Archiviazione, 2. Conservazione in archivi, ad accesso selezionato, 3. Conservazione in contenitori muniti di serratura, 4. Controllo degli accessi fisici, 5. Controllo degli accessi logici, 6. Custodia atti e documenti, 7. Formazione degli incaricati, 8. Sicurezza dei siti web','json'),',','","') '"]' )
       ) A
Where Value is not null
Order By [key]

Results

1. Archiviazione
2. Conservazione in archivi, ad accesso selezionato
3. Conservazione in contenitori muniti di serratura
4. Controllo degli accessi fisici
5. Controllo degli accessi logici
6. Custodia atti e documenti
7. Formazione degli incaricati
8. Sicurezza dei siti web 

CodePudding user response:

To the other answers, it is of course better to separate the data in a more appropriate fashion on input. When you are faced with existing data, STRING_SPLIT will not do what you are looking for so you will need to "parse the string manually". I search back for SQL answers prior to the availability of STRING_SPLIT are good references, however some other features have come out since that makes this a little nicer.

Because your format is based on incrementing numbers, you can take advantage of the following steps: a) Generate a "numbers" table/subquery and make a string for each number , eg. '1.' '2.' '3.', etc... b) Find the PATINDEX of each number as well as the PATINDEX of the "next" number via LEAD c) Use the two indices to create parameters for SUBSTRING and do some string cleanup

See below for a sample query that will split the substrings. You could replace @theString with an iTVF parameter or however you need to apply this in your code.

    DECLARE @theString VARCHAR(MAX) = '1. Archiviazione, 2. Conservazione in archivi**,** ad accesso selezionato, 3. Conservazione in contenitori muniti di serratura, 4. Controllo degli accessi fisici, 5. Controllo degli accessi logici, 6. Custodia atti e documenti, 7. Formazione degli incaricati, 8. Sicurezza dei siti web'

--SET @theString = '1. Apples, 2. Oranges, 3. Peaches, 4. Pears, 5. Cheese, 6. Kiwis, 7. Toast'
;WITH CTE_Nums AS
(
    SELECT CONCAT(', ',CAST([NumValue] AS VARCHAR(10)),'. ') [NumString],[NumValue]
    FROM (
        SELECT ones.n   tens.n * 10   huns.n * 100 [NumValue]
        FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
             (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
             (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) huns(n)
        ) n1
)
SELECT 
    N.NumValue,
    N.Pos1,
    f1.Pos1 pos1f,
    N.Pos2,
    f1.Pos2 pos2f,
    f1.Pos2 - N.Pos1 SubLen,
    SUBSTRING(@theString,f1.Pos1,(f1.Pos2 - f1.Pos1 ) ) SubStr
FROM
    ( SELECT 
        nn.NumValue,
        PATINDEX('%' nn.[NumString] '%',@theString) Pos1,
        PATINDEX('%'  LEAD(nn.[NumString]) OVER ( ORDER BY nn.NumValue ASC )   '%',@theString) Pos2
        FROM 
        CTE_Nums nn
    ) N
    CROSS APPLY (SELECT IIF(N.NumValue = 1,N.Pos1,N.Pos1 LEN(',-')) Pos1,
    IIF(N.Pos2 > 0,N.Pos2,LEN(@theString)) Pos2 ) f1
WHERE
    N.Pos1 > 0 OR N.NumValue = 1
ORDER BY 
    N.NumValue ASC

I have left the other columns but the [SubStr] column is the one you want that returns:

  1. Archiviazione
  2. Conservazione in archivi**,** ad accesso selezionato
  3. Conservazione in contenitori muniti di serratura
  4. Controllo degli accessi fisici
  5. Controllo degli accessi logici
  6. Custodia atti e documenti
  7. Formazione degli incaricati
  8. Sicurezza dei siti we
  • Related