Home > Enterprise >  Split column value into separate columns based on length
Split column value into separate columns based on length

Time:12-21

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).

When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.

enter image description here

I have done from row level value to column level only but its should be 2000 character and based on comma

Could you please help me with this ?

CodePudding user response:

siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):

First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.

DECLARE @tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO @tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
                      ,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');

--This is, what you actually need:

SELECT fkID = t.ID
      ,B.fragmentPosition
      ,B.fragmentContent 
      ,C.framgentLength
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);

The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.

If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:

DECLARE @maxPerColumn INT=75;  --You can set the portion's max size, in your case 2000.

WITH cte AS
(
    SELECT fkID = t.ID
          ,B.fragmentPosition
          ,B.fragmentContent 
          ,C.framgentLength
    FROM @tbl t
    CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
    CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
    CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
    SELECT * 
          ,countPerColumn = 1
          ,columnCounter = 1
          ,sumLength = LEN(fragmentContent)
          ,growingString = CAST(fragmentContent AS NVARCHAR(MAX)) 
    FROM cte WHERE fragmentPosition=0

    UNION ALL
    SELECT r.fkID
          ,cte.fragmentPosition
          ,cte.fragmentContent
          ,cte.framgentLength
          ,CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE r.countPerColumn   1 END
          ,r.columnCounter   CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE 0 END
          ,CASE WHEN A.newSumLength>@maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
          ,CASE WHEN A.newSumLength>@maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
    FROM cte
    INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition 1=cte.fragmentPosition 
    CROSS APPLY(VALUES(r.sumLength LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES 
       fkID
      ,growingString
      ,LEN(growingString)
FROM recCTE 
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );

The result

fkID    pos Content
1       2   1 this is long text, 2 some second fragment, 3 third fragment
1       4   4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2       0   1 this is other long text
2       1   2 some second fragment to show that this works with tabular data
2       3   3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2       4   5 halksjfh asdkf

The idea in short:

  • The first cte does the splitting (as above)
  • The recursive cte will iterate down the string and do the magic.
  • The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.

Hint: Don't do this...

UPDATE

Just for fun:

You can replace the final SELECT with this

,getPortions AS
(
    SELECT TOP 1 WITH TIES 
           fkID
          ,fragmentPosition
          ,growingString
          ,LEN(growingString) portionLength
    FROM recCTE 
    ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
    SELECT fkID
          ,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
          ,growingString
    FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;

This will return exactly what you are asking for.

But - as said before - this is against all rules of best practice...

  • Related