Home > Back-end >  separate a date-delimited text string
separate a date-delimited text string

Time:11-30

I have a text string in the following format:

09/05/2021: La persona seleccionada se encuentra en seguridad 05/09/2021: Se envia un nuevo grupo de candidatos 07/11/2021: El candidato contratado no inicia 

So I want to separate each comment from the string by each date. How do i do that?

The result should be the following:

DATE COMMENT
09/05/2021 La persona seleccionada se encuentra en seguridad
05/09/2021 Se envia un nuevo grupo de candidatos
07/11/2021 El candidato contratado no inicia

CodePudding user response:

I don't like solutions that rely on the order of output from SPLIT, because the manual states that it is not guaranteed to be the order of input.. In practice I believe that it uses .NET's split under the hood, and probably will output in order..

Caveat emptor!

declare @t VARCHAR(200) = '09/05/2021: La persona seleccionada se encuentra en seguridad 05/09/2021: Se envia un nuevo grupo de candidatos 07/11/2021: El candidato contratado no inicia';

WITH x AS (
  SELECT 
    RIGHT(LAG(value) OVER(order by (select null)), 10) as d, 
    SUBSTRING(value, 2, LEN(value) - CASE WHEN value LIKE '%__/__/____' THEN 11 ELSE 1 END) as t  
  FROM 
    string_split(@t, ':') 
)
SELECT d, t FROM x WHERE d IS NOT NULL

This splits on : and then retrieves the date off the end of the "previous" value for pairing with the text from the value of the current. It examines whether there is a date on the end before deciding whether to substring up to the string end, or up to 10 chars back from the string end


If you want to do this safer in SQLS, you might do better to use a recursive CTE and SUBSTRING and CHARINDEX to look for the positions of ': ' progressively, and then cut from 10 chars back, up to the next index.

That looks like this:

declare @t VARCHAR(200) = '09/05/2021: La persona seleccionada se encuentra en seguridad 05/09/2021: Se envia un nuevo grupo de candidatos 07/11/2021: El candidato contratado no inicia';

WITH x(s, sp, p) as (
    SELECT s, 1, CHARINDEX(': ', s) FROM (SELECT @t AS s) xx
    UNION ALL
    SELECT s, p   1, CHARINDEX(': ', s, p   1)
    FROM X
    WHERE p > 0
)
SELECT 
    SUBSTRING(s, sp - 11, 10) d, 
    SUBSTRING(s, sp   1, CASE WHEN p > 0 THEN p - sp - 12 ELSE LEN(s) END) c
FROM x
WHERE sp > 1

The recursive CTE repeatedly uses CHARINDEX to find ': ' and effectively gives a starting position (sp) and ending position (p) each time it goes round. By using the position from last time as the starting position this time, the string is searched incrementally. From these an adjustment of index to factor the date can be used, and similarly for the comment. Do a select * FROM x if you want to see more info


It would be much safer to do this in the front end language, for example C# does guarantee split order, this logic is similar:

var a = string.Split(": ");
for(int i = 1; i < a.Length - 1; i  )
  Console.WriteLine( (a[i-1][^10..], a[i][..^11]) );

Console.WriteLine( (a[^2][^10..], a[^1]) );

This variation splits on ": ", something SQLS cannot do, which reduces the need to trim extraneous spaces from the start. After that a loop processes all but the last line (it has no date on the end so must be treated differently) starting from 1 (so it can safely reference a previous element).

  • a[i-1][^10..] means array a, element string at i-1, substring 10 from end to 0 from end i.e. last 10 chars
  • a[i][..^11] means array a, element i, substring from start to 11 chars back from end i.e. all except last 11 chars
  • a[^2][^10..] means array a, element 2 from end i.e. last but one, last 10 chars
  • a[^1] means array a, last element of
  • Related