I have a table with an nvarchar(max) column including a merged text like below:
ID MyString
61 Team:Finance,Accounting,HR,Country:Global,
62 Country:Germany,
63 Team:Legal,
64 Team:Finance,Accounting,Country:Global,External:Tenants,Partners,
65 External:Vendors,
What I need is to create another table for each item having the Team, Country and External values separated into 3 different columns.
Id Team Country External
61 Finance,Accounting,HR Global NULL
62 NULL Germany NULL
63 Legal NULL NULL
64 Finance,Accounting Global Tenants,Partners
65 NULL NULL Vendors
What is the most efficient way to do it? I'm trying to use STRING_SPLIT but couldn't manage it.
Any help would be appreciated.
CodePudding user response:
Please try the following solution.
Data resembles JSON, so we'll compose a proper JSON via few REPLACE()
function calls.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO @tbl (ID, tokens) VALUES
(61, 'Team:Finance,Accounting,HR,Country:Global,'),
(62, 'Country:Germany,'),
(63, 'Team:Legal,'),
(64, 'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65, 'External:Vendors,');
-- DDL and sample data population, end
SELECT *
FROM @tbl
CROSS APPLY OPENJSON('{"' REPLACE(REPLACE(REPLACE(TRIM(',' FROM tokens), ':', '": "')
,',Country', '", "Country')
,',External', '", "External') '"}')
WITH
(
Team VARCHAR(100) '$.Team',
Country VARCHAR(100) '$.Country',
[External] VARCHAR(100) '$.External'
) AS u;
Output
---- ------------------------------------------------------------------- ----------------------- --------- ------------------
| ID | tokens | Team | Country | External |
---- ------------------------------------------------------------------- ----------------------- --------- ------------------
| 61 | Team:Finance,Accounting,HR,Country:Global, | Finance,Accounting,HR | Global | NULL |
| 62 | Country:Germany, | NULL | Germany | NULL |
| 63 | Team:Legal, | Legal | NULL | NULL |
| 64 | Team:Finance,Accounting,Country:Global,External:Tenants,Partners, | Finance,Accounting | Global | Tenants,Partners |
| 65 | External:Vendors, | NULL | NULL | Vendors |
---- ------------------------------------------------------------------- ----------------------- --------- ------------------
CodePudding user response:
Firstly, let me repeat my comments here. SQL Server is the last place you should be doing this; it's string manipulation is poor and you have a severely denormalised design, with denormalised data containing denormalised data. Fixing your design to a normalised approach must be a priority, as leaving your data in this state is only going to make things harder the further you go down this rabbit hole.
One method you could use to achieve this, however, would be with a JSON splitter and some restring aggregation, but this is real ugly. The choice of having the "column" and "row" delimiter to both be a comma (,
) makes this a complete mess, and I am not going to explain what it's doing because you just should not be doing this.
WITH YourTable AS(
SELECT *
FROM (VALUES(61,'Team:Finance,Accounting,HR,Country:Global,'),
(62,'Country:Germany,'),
(63,'Team:Legal,'),
(64,'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65,'External:Vendors,'))V(ID,MyString)),
PartiallyNormal AS(
SELECT YT.ID,
CONVERT(int,LEAD(OJC.[Key],1,OJC.[Key]) OVER (PARTITION BY ID ORDER BY OJC.[Key], OJV.[Key])) AS ColumnNo,
OJV.[value],
CONVERT(int,OJC.[key]) AS [key]
FROM YourTable YT
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(YT.MyString,':','","'),'"]')) OJC
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(OJC.[value],',','","'),'"]')) OJV),
WithNames AS(
SELECT ID,
ColumnNo,
[value],
[key],
FIRST_VALUE(PN.[Value]) OVER (PARTITION BY ID, ColumnNo ORDER BY [Key]) AS ColumnName
FROM PartiallyNormal PN)
SELECT ID,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Team' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Team, --TRIM because I've not taken the time to work out why there are sometimes a trailing comma
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Country' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Country,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'External' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS [External]
FROM WithNames WN
WHERE [value] <> [ColumnName]
GROUP BY ID
ORDER BY ID;
CodePudding user response:
STRING_SPLIT
in SQL Server 2017 doesn't tell us the order of the items in the list, so it can't be used here.
Only SQL Server 2022 would add a parameter to STRING_SPLIT
that would tell the order of the items.
Until that version of SQL Server the most efficient method would likely be the CLR. Write your parser in C# and call your function using CLR.
CodePudding user response:
Another option is:
- splitting the string using the
STRING_SPLIT
function on the colon - extracting consecutive strings using the
LAG
function - removing the string identifiers (
Team
,Country
andExternal
) - aggregating on the
ID
to remove NULL values
Here's the query:
WITH cte AS (
SELECT ID,
LAG(value) OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS prev_value,
value
FROM tab
CROSS APPLY STRING_SPLIT(MyString, ':')
)
SELECT ID,
MAX(CASE WHEN prev_value LIKE 'Team'
THEN REPLACE(value, ',Country', '') END) AS [Team],
MAX(CASE WHEN prev_value LIKE '%Country'
THEN LEFT(value, LEN(value)-1) END) AS [Country],
MAX(CASE WHEN prev_value LIKE '%External'
THEN LEFT(value, LEN(value)-1) END) AS [External]
FROM cte
GROUP BY ID
Check the demo here.