Home > OS >  Extracting a string between a known string and a delimiter T-SQL
Extracting a string between a known string and a delimiter T-SQL

Time:02-08

I need help extracting parts of a very long string which is between a known string and a delimiter. I am using software which scans for an inputted pattern and pulls the pattern into a very long string with other information that is not needed. Within the string, I need to extract what we call a ReferenceNumber. However, if multiple patterns have been found, they too will also be inputted within the same string.

Here is some sample dummy data (I will only show part of the string):

Id PatternsFound ReferenceNumber
1 1 {Name:09483884-1,Value
2 2 {Name:AD103758,Value:DetectedValues:{Name:04959489-00,Value
3 1 {Name:09847364-000000001,Value

In the ReferenceNumber column, I need to extract everything after "{Name:" and everything before ",Value".

As you can see in Id column, row two, 2 ReferenceNumber have been found and they need to be extracted into two new columns. 5 different patters could all be in the same sting which would need 5 columns.

Is this possible and can anyone help me?

Thanks

CodePudding user response:

It looks like you should be able to split the string using the :, remove unecessary elements and then aggregate using a conditional case expression to pivot your 5 columns.

If the ordering of the columns doesn't matter you can use string_split(), otherwise you can use a custom split function such as the one included below:

with split as (
    select id, Replace(s.value, ',Value','') ref,
      Row_Number() over(partition by id order by s.seq) seq
    from t
    cross apply dbo.SplitString(referencenumber,':') s
    where s.value like '%,value%'
)
select id, Count(*) PatternsFound,
    max(case when seq=1 then ref end) Ref1,
    max(case when seq=2 then ref end) Ref2,
    max(case when seq=3 then ref end) Ref3,
    max(case when seq=4 then ref end) Ref4,
    max(case when seq=5 then ref end) Ref5
from split
group by id

Example split string function:

create function dbo.SplitString(@string varchar(1000), @Delimiter varchar(10))
returns table
as
   return(  
      select j.[value], 1   Convert(tinyint,j.[key]) Seq 
      from OpenJson(Concat('["',replace(@string,@delimiter, '","'),'"]')) j
   );

Example Fiddle

CodePudding user response:

Please try the following solution.

It will work starting from SQL Server 2017 onwards.

No UDFs.

The idea is to convert the referencenumber column into a JSON array.

For example, for ID=2, it becomes ["AD103758","04959489-00"].

After that it is easy to shred it into columns via JSON_VALUE() .

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, referencenumber VARCHAR(1000))
insert into @tbl (referencenumber) VALUES
('{Name:09483884-1,Value'),
('{Name:AD103758,Value:DetectedValues:{Name:04959489-00,Value');
-- DDL and sample data population, end

WITH rs AS
(
    SELECT ID
        , j = '["'   REPLACE(STRING_AGG(value, '","'),'Name:', '')   '"]'
    FROM @tbl
    CROSS APPLY STRING_SPLIT(REPLACE(referencenumber, '{', ','), ',')
    WHERE value LIKE 'Name:%'
    GROUP BY ID
)
SELECT rs.id
    , Ref1 = JSON_VALUE(j,'$[0]')
    , Ref2 = JSON_VALUE(j,'$[1]')
    , Ref3 = JSON_VALUE(j,'$[2]')
    , Ref4 = JSON_VALUE(j,'$[3]')
    , Ref5 = JSON_VALUE(j,'$[4]')
FROM rs;

Output

 ---- ------------ ------------- ------ ------ ------ 
| id |    Ref1    |    Ref2     | Ref3 | Ref4 | Ref5 |
 ---- ------------ ------------- ------ ------ ------ 
|  1 | 09483884-1 | NULL        | NULL | NULL | NULL |
|  2 | AD103758   | 04959489-00 | NULL | NULL | NULL |
 ---- ------------ ------------- ------ ------ ------ 

CodePudding user response:

Here's a solution that uses an UDF fnPattern_Split.
Which can split strings on a pattern. (Taken from here)

Then the match is transformed into a json format.
So that JSON_VALUE can be used on it.

SELECT *
FROM
(
    select id
    , concat('RefNum', a.rn) AS COL
    , JSON_VALUE(a.js, '$.Name') as Name
    from test t
    outer apply (
    select 
      row_number() over (order by spl.ordinal) as rn
    , replace(replace(spl.value, '{Name:', '{"Name":"'), ',', '"}') as js
    from dbo.fnPattern_Split(t.ReferenceNumber, '{Name:%,') spl
    where spl.match = 1
  ) a
) Src
PIVOT (
  MAX(Name)
  FOR COL IN ([RefNum1], [RefNum2])
) Pvt;
id RefNum1 RefNum2
1 09483884-1 null
2 AD103758 04959489-00
3 09847364-000000001 null

Test on db<>fiddle here

  •  Tags:  
  • Related