I have comma separated data in a column named C0.
The data in C0 looks like this:
C0 |
---|
"Pacey LLC.",213830ZZ,11/1/2017,11/1/2017,"297,311.74","2,371.40",0.00,"1,325.18",0.00,42.22,"123,986.56" |
Mike The Miker,9814140VCD,12/1/2018,12/1/2018,"3,917,751.99","419,743.54","36,642.66","344,090.43",0.00,10.00,"2,434,671.06" |
And I want it to end up like this:
F1 | F1 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 |
---|---|---|---|---|---|---|---|---|---|---|
"Pacey LLC." | 213830ZZ | 11/1/2017 | 11/1/2017 | 297,311.74 | 2,371.40 | 0.00 | 1,325.18 | 0.00 | 42.22 | 123,986.56 |
Mike The Miker | 9814140VCD | 12/1/2018 | 12/1/2018 | 3,917,751.99 | 419,743.54 | 36,642.66 | 344,090.43 | 0.00 | 10.00 | 2,434,671.06 |
I've tried nested replaces, but couldn't find a pattern to reliably search without regex which is T/SQL? I've also tried a TOKEN approach in SSIS by this feller, but neither fruitful.
The nested replace approaches got stuck on the money fields that are under 1,000 (like 0.00) and the SSIS TOKEN approach presumes all fields are quote delimited, which in my example they aren't.
CodePudding user response:
As you were told already, TSQL is the wrong tool for this. Nevertheless this can be done (at least for the set given). If this is a one-time action you might give it a try. If this is a re-occurring task in a real-life scenario I'd try to get the data in an appropriate format.
However, this would work for the given lines:
DECLARE @t1 TABLE(ID INT IDENTITY, YourString NVARCHAR(1000));
INSERT INTO @t1 VALUES(N'"Pacey LLC.",213830ZZ,11/1/2017,11/1/2017,"297,311.74","2,371.40",0.00,"1,325.18",0.00,42.22,"123,986.56"')
,(N'Mike The Miker,9814140VCD,12/1/2018,12/1/2018,"3,917,751.99","419,743.54","36,642.66","344,090.43",0.00,10.00,"2,434,671.06"');
--Your data includes dates in a culture specific format (something really! bad)
--Better switch to ISO8601
--Setting the date format will help, but is NOT recommended
SET DATEFORMAT dmy;
--the first cte will use APPLY
together with a computed TOP()
--This will allow to get each single character, one by one.
WITH singleChars AS
(
SELECT t.ID
,A.Pos
,SUBSTRING(t.YourString,A.POs,1) AS CharOnPos
FROM @t1 t
CROSS APPLY(SELECT TOP (LEN(t.YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Pos) --master..spt_values can be any table with sufficient rows
)
--we continue with a recursive cte
--it will run through the string and find if we are within a quoted area or not
,recCTE AS
(
SELECT *
,CASE WHEN CharOnPos='"' THEN 1 ELSE 0 END AS QuoteIsOpen
,CAST(CharOnPos AS NVARCHAR(MAX)) AS GrowingString
FROM singleChars WHERE Pos=1
UNION ALL
SELECT sc.ID,sc.Pos,sc.CharOnPos
,A.QuoteIsStillOpen
,CONCAT(GrowingString,CASE WHEN sc.CharOnPos=N',' AND A.QuoteIsStillOpen=0 THEN N'$%&' ELSE sc.CharOnPos END)
FROM singleChars sc
INNER JOIN recCTE r ON sc.ID = r.ID AND sc.Pos=r.Pos 1
CROSS APPLY(VALUES(CASE WHEN sc.CharOnPos='"' THEN CASE WHEN r.QuoteIsOpen=1 THEN 0 ELSE 1 END ELSE r.QuoteIsOpen END )) A(QuoteIsStillOpen)
)
--this CTE performs a trick with TOP 1 WITH TIES
together with ORDER BY
a partitioned ROW_NUMBER()
--The result will include the final string of the recursion by ID
,newlySeparated AS
(
SELECT TOP 1 WITH TIES * FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Pos DESC)
)
--The final SELECT
uses a trick to split strings position- and type-safe
SELECT A.*
FROM newlySeparated ns
CROSS APPLY OPENJSON(CONCAT(N'[["',REPLACE(REPLACE(ns.GrowingString,'"',''),'$%&','","'),N'"]]'))
WITH(Company NVARCHAR(100) '$[0]'
,Code1 NVARCHAR(100) '$[1]'
,Date1 DATE '$[2]'
,Date2 DATE '$[3]'
,Decimal1 NVARCHAR(100) '$[4]' --Using a numbers type might work here, this depends on your machine
,Decimal2 NVARCHAR(100) '$[5]'
,Decimal3 NVARCHAR(100) '$[6]'
,Decimal4 NVARCHAR(100) '$[7]'
,Decimal5 NVARCHAR(100) '$[8]'
,Decimal6 NVARCHAR(100) '$[9]'
,Decimal7 NVARCHAR(100) '$[10]') A
OPTION(MAXRECURSION 0);
The result
---------------- ------------ ------------ ------------ -------------- ------------ ----------- ------------ ------ ------- --------------
| Pacey LLC. | 213830ZZ | 2017-01-11 | 2017-01-11 | 297,311.74 | 2,371.40 | 0.00 | 1,325.18 | 0.00 | 42.22 | 123,986.56 |
---------------- ------------ ------------ ------------ -------------- ------------ ----------- ------------ ------ ------- --------------
| Mike The Miker | 9814140VCD | 2018-01-12 | 2018-01-12 | 3,917,751.99 | 419,743.54 | 36,642.66 | 344,090.43 | 0.00 | 10.00 | 2,434,671.06 |
---------------- ------------ ------------ ------------ -------------- ------------ ----------- ------------ ------ ------- --------------