Home > Software engineering >  Substring in nested query
Substring in nested query

Time:10-10

I have a nvarchar column in table that stores data as '{CRLF}' separated values. Those values can be in the format 'namespace#name$value' but also in other formats.

I need to extract separated namespace, name and value and skip data in other formats. I can do it with a query:

select 
  SUBSTRING([value], 1, CHARINDEX('#',[value]) - 1) as paramNamespace,
  SUBSTRING([value], CHARINDEX('#',[value])   1, CHARINDEX('$',[value]) - CHARINDEX('#',[value]) - 1) as paramName,
  SUBSTRING([value], CHARINDEX('$',[value])   1, LEN([value])) as paramValue
FROM STRING_SPLIT(REPLACE((select data from test where Id = 1), '{CRLF}', CHAR(7)), CHAR(7))
WHERE [value] LIKE '%#%$%'

However, when I try also to filter in by paramName with this query:

select * from (select 
  SUBSTRING([value], 1, CHARINDEX('#',[value]) - 1) as paramNamespace,
  SUBSTRING([value], CHARINDEX('#',[value])   1, CHARINDEX('$',[value]) - CHARINDEX('#',[value]) - 1) as paramName,
  SUBSTRING([value], CHARINDEX('$',[value])   1, LEN([value])) as paramValue
FROM STRING_SPLIT(REPLACE((select data from test where Id = 1), '{CRLF}', CHAR(7)), CHAR(7))
WHERE [value] LIKE '%#%$%') a
where paramName IN ('name', 'name3')

it returns

Invalid length parameter passed to the LEFT or SUBSTRING function.

It looks like "WHERE [value] LIKE '%#%$%'" weren't executed first and it tries to substring value in other format.

How should I write this query to filter it by paramName?

Schema:

CREATE TABLE test (Id INT, data nvarchar(max));
INSERT INTO test VALUES (1, N'namespace#name$value{CRLF}somedatainotherformat{CRLF}namespace2#name2$value2{CRLF}namespace3#name3$value3{CRLF}');

Fiddle: http://sqlfiddle.com/#!18/e5d71/1

CodePudding user response:

Perhaps PARSENAME to split the data into columns is what you are after:

CREATE TABLE test (Id int,
                   data nvarchar(MAX));
INSERT INTO test
VALUES (1, N'namespace#name$value{CRLF}somedatainotherformat{CRLF}namespace2#name2$value2{CRLF}namespace3#name3$value3{CRLF}');
GO
SELECT *
FROM dbo.test t
     CROSS APPLY STRING_SPLIT(REPLACE(t.data,'{CRLF}','|'),'|') SS
     CROSS APPLY (VALUES(PARSENAME(TRANSLATE(SS.[value],'$#','..'),3),PARSENAME(TRANSLATE(SS.[value],'$#','..'),2),PARSENAME(TRANSLATE(SS.[value],'$#','..'),1)))V(paramNamespace, paramName, paramValue)
WHERE V.paramName IN ('name', 'name3')
GO

DROP TABLE test;

CodePudding user response:

Here is one option:

Select paramNamespace = substring(v.data, 1, p1.pos - 2)
     , paramName = substring(v.data, p1.pos, p2.pos - p1.pos - 1)
     , paramValue = substring(v.data, p2.pos, p3.pos - p2.pos - 1)
 From test As t
Cross Apply string_split(replace(t.data, '{CRLF}', '|'), '|') As s
Cross Apply (Values (concat(s.value, '#$'))) As v(data)
Cross Apply (Values (charindex('#', v.data, 1)   1)) As p1(pos)
Cross Apply (Values (charindex('$', v.data, p1.pos)   1)) As p2(pos)
Cross Apply (Values (charindex('#', v.data, p2.pos)   1)) As p3(pos)  --end of set
Where p2.pos - p1.pos > 1;

We concat the delimiters to the end - and then filter out any rows where the difference between the delimiters is greater than 1. So any rows that have only '#$' or rows that start with '#$' or rows that have some other value that does not include either of the delimiters.

If you have data that can have either delimiter - but not the other, you need to filter those out some other way.

  • Related