I need help extracting parts of a very long string which is between a known string and a delimiter. I am using software which scans for an inputted pattern and pulls the pattern into a very long string with other information that is not needed. Within the string, I need to extract what we call a ReferenceNumber. However, if multiple patterns have been found, they too will also be inputted within the same string.
Here is some sample dummy data (I will only show part of the string):
Id | PatternsFound | ReferenceNumber |
---|---|---|
1 | 1 | {Name:09483884-1,Value |
2 | 2 | {Name:AD103758,Value:DetectedValues:{Name:04959489-00,Value |
3 | 1 | {Name:09847364-000000001,Value |
In the ReferenceNumber column, I need to extract everything after "{Name:" and everything before ",Value".
As you can see in Id column, row two, 2 ReferenceNumber have been found and they need to be extracted into two new columns. 5 different patters could all be in the same sting which would need 5 columns.
Is this possible and can anyone help me?
Thanks
CodePudding user response:
It looks like you should be able to split the string using the :
, remove unecessary elements and then aggregate using a conditional case expression to pivot your 5 columns.
If the ordering of the columns doesn't matter you can use string_split()
, otherwise you can use a custom split function such as the one included below:
with split as (
select id, Replace(s.value, ',Value','') ref,
Row_Number() over(partition by id order by s.seq) seq
from t
cross apply dbo.SplitString(referencenumber,':') s
where s.value like '%,value%'
)
select id, Count(*) PatternsFound,
max(case when seq=1 then ref end) Ref1,
max(case when seq=2 then ref end) Ref2,
max(case when seq=3 then ref end) Ref3,
max(case when seq=4 then ref end) Ref4,
max(case when seq=5 then ref end) Ref5
from split
group by id
Example split string function:
create function dbo.SplitString(@string varchar(1000), @Delimiter varchar(10))
returns table
as
return(
select j.[value], 1 Convert(tinyint,j.[key]) Seq
from OpenJson(Concat('["',replace(@string,@delimiter, '","'),'"]')) j
);
CodePudding user response:
Please try the following solution.
It will work starting from SQL Server 2017 onwards.
No UDFs.
The idea is to convert the referencenumber column into a JSON array.
For example, for ID=2, it becomes ["AD103758","04959489-00"]
.
After that it is easy to shred it into columns via JSON_VALUE()
.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, referencenumber VARCHAR(1000))
insert into @tbl (referencenumber) VALUES
('{Name:09483884-1,Value'),
('{Name:AD103758,Value:DetectedValues:{Name:04959489-00,Value');
-- DDL and sample data population, end
WITH rs AS
(
SELECT ID
, j = '["' REPLACE(STRING_AGG(value, '","'),'Name:', '') '"]'
FROM @tbl
CROSS APPLY STRING_SPLIT(REPLACE(referencenumber, '{', ','), ',')
WHERE value LIKE 'Name:%'
GROUP BY ID
)
SELECT rs.id
, Ref1 = JSON_VALUE(j,'$[0]')
, Ref2 = JSON_VALUE(j,'$[1]')
, Ref3 = JSON_VALUE(j,'$[2]')
, Ref4 = JSON_VALUE(j,'$[3]')
, Ref5 = JSON_VALUE(j,'$[4]')
FROM rs;
Output
---- ------------ ------------- ------ ------ ------
| id | Ref1 | Ref2 | Ref3 | Ref4 | Ref5 |
---- ------------ ------------- ------ ------ ------
| 1 | 09483884-1 | NULL | NULL | NULL | NULL |
| 2 | AD103758 | 04959489-00 | NULL | NULL | NULL |
---- ------------ ------------- ------ ------ ------
CodePudding user response:
Here's a solution that uses an UDF fnPattern_Split
.
Which can split strings on a pattern. (Taken from here)
Then the match is transformed into a json format.
So that JSON_VALUE
can be used on it.
SELECT * FROM ( select id , concat('RefNum', a.rn) AS COL , JSON_VALUE(a.js, '$.Name') as Name from test t outer apply ( select row_number() over (order by spl.ordinal) as rn , replace(replace(spl.value, '{Name:', '{"Name":"'), ',', '"}') as js from dbo.fnPattern_Split(t.ReferenceNumber, '{Name:%,') spl where spl.match = 1 ) a ) Src PIVOT ( MAX(Name) FOR COL IN ([RefNum1], [RefNum2]) ) Pvt;
id RefNum1 RefNum2 1 09483884-1 null 2 AD103758 04959489-00 3 09847364-000000001 null
Test on db<>fiddle here