SQL - Separating a merge field into separate fields based on delimiters-CodePudding

I have a table with an nvarchar(max) column including a merged text like below:

ID  MyString
61  Team:Finance,Accounting,HR,Country:Global,
62  Country:Germany,
63  Team:Legal,
64  Team:Finance,Accounting,Country:Global,External:Tenants,Partners,
65  External:Vendors,

What I need is to create another table for each item having the Team, Country and External values separated into 3 different columns.

Id   Team                      Country   External
61   Finance,Accounting,HR     Global    NULL
62   NULL                      Germany   NULL
63   Legal                     NULL      NULL
64   Finance,Accounting        Global    Tenants,Partners
65   NULL                      NULL      Vendors

What is the most efficient way to do it? I'm trying to use STRING_SPLIT but couldn't manage it.

Any help would be appreciated.

CodePudding user response：

Please try the following solution.

Data resembles JSON, so we'll compose a proper JSON via few REPLACE() function calls.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO @tbl (ID, tokens) VALUES
(61, 'Team:Finance,Accounting,HR,Country:Global,'),
(62, 'Country:Germany,'),
(63, 'Team:Legal,'),
(64, 'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65, 'External:Vendors,');
-- DDL and sample data population, end

SELECT * 
FROM @tbl
CROSS APPLY OPENJSON('{"'   REPLACE(REPLACE(REPLACE(TRIM(',' FROM tokens), ':', '": "')
    ,',Country',  '", "Country')
    ,',External', '", "External')   '"}')
WITH
(
   Team         VARCHAR(100)    '$.Team',
   Country      VARCHAR(100)    '$.Country',
   [External]   VARCHAR(100)    '$.External'
) AS u;

Output

 ---- ------------------------------------------------------------------- ----------------------- --------- ------------------ 
| ID |                              tokens                               |         Team          | Country |     External     |
 ---- ------------------------------------------------------------------- ----------------------- --------- ------------------ 
| 61 | Team:Finance,Accounting,HR,Country:Global,                        | Finance,Accounting,HR | Global  | NULL             |
| 62 | Country:Germany,                                                  | NULL                  | Germany | NULL             |
| 63 | Team:Legal,                                                       | Legal                 | NULL    | NULL             |
| 64 | Team:Finance,Accounting,Country:Global,External:Tenants,Partners, | Finance,Accounting    | Global  | Tenants,Partners |
| 65 | External:Vendors,                                                 | NULL                  | NULL    | Vendors          |
 ---- ------------------------------------------------------------------- ----------------------- --------- ------------------

CodePudding user response：

Firstly, let me repeat my comments here. SQL Server is the last place you should be doing this; it's string manipulation is poor and you have a severely denormalised design, with denormalised data containing denormalised data. Fixing your design to a normalised approach must be a priority, as leaving your data in this state is only going to make things harder the further you go down this rabbit hole.

One method you could use to achieve this, however, would be with a JSON splitter and some restring aggregation, but this is real ugly. The choice of having the "column" and "row" delimiter to both be a comma (,) makes this a complete mess, and I am not going to explain what it's doing because you just should not be doing this.

WITH YourTable AS(
    SELECT *
    FROM (VALUES(61,'Team:Finance,Accounting,HR,Country:Global,'),
                (62,'Country:Germany,'),
                (63,'Team:Legal,'),
                (64,'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
                (65,'External:Vendors,'))V(ID,MyString)),
PartiallyNormal AS(
    SELECT YT.ID,
           CONVERT(int,LEAD(OJC.[Key],1,OJC.[Key]) OVER (PARTITION BY ID ORDER BY OJC.[Key], OJV.[Key])) AS ColumnNo,
           OJV.[value],
           CONVERT(int,OJC.[key]) AS [key]
    FROM YourTable YT
         CROSS APPLY OPENJSON(CONCAT('["', REPLACE(YT.MyString,':','","'),'"]')) OJC
         CROSS APPLY OPENJSON(CONCAT('["', REPLACE(OJC.[value],',','","'),'"]')) OJV),
WithNames AS(
    SELECT ID,
           ColumnNo,
           [value],
           [key],
           FIRST_VALUE(PN.[Value]) OVER (PARTITION BY ID, ColumnNo ORDER BY [Key]) AS ColumnName
    FROM PartiallyNormal PN)
SELECT ID,
       TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Team' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Team, --TRIM because I've not taken the time to work out why there are sometimes a trailing comma
       TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Country' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Country,
       TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'External' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS [External]
FROM WithNames WN
WHERE [value] <> [ColumnName]
GROUP BY ID
ORDER BY ID;

db<>fiddle

CodePudding user response：

STRING_SPLIT in SQL Server 2017 doesn't tell us the order of the items in the list, so it can't be used here. Only SQL Server 2022 would add a parameter to STRING_SPLIT that would tell the order of the items.

Until that version of SQL Server the most efficient method would likely be the CLR. Write your parser in C# and call your function using CLR.

CodePudding user response：

Another option is:

splitting the string using the STRING_SPLIT function on the colon
extracting consecutive strings using the LAG function
removing the string identifiers (Team, Country and External)
aggregating on the ID to remove NULL values

Here's the query:

WITH cte AS (
    SELECT ID,
           LAG(value) OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS prev_value,
           value
    FROM tab
    CROSS APPLY STRING_SPLIT(MyString, ':')
)
SELECT ID,
       MAX(CASE WHEN prev_value LIKE 'Team' 
                THEN REPLACE(value, ',Country', '') END) AS [Team],
       MAX(CASE WHEN prev_value LIKE '%Country' 
                THEN LEFT(value, LEN(value)-1)      END) AS [Country],
       MAX(CASE WHEN prev_value LIKE '%External' 
                THEN LEFT(value, LEN(value)-1)      END) AS [External]
FROM cte
GROUP BY ID

Check the demo here.