Split delimiter based rows in to new rows

I have a following data and I want to split the each row by delimiter into new rows.

Sample Data:

**Row No. | Data**
        1 | abc,pqr,lmn,omr
        2 | aaa,bbb,ccc,ddd

And the output should be:

**Row No. | Data**
        1 | abc
        2 | pqr
        3 | lmn
        4 | omr
        5 | aaa
        6 | bbb
        6 | ccc
        6 | ddd

I have a following code snippet but it only works for the first row.

    SELECT * EXCEPT(c) REPLACE(c AS col) 
    FROM `project.dataset.table`, 
    UNNEST(SPLIT(col)) c;

CodePudding user response：

Use the following:

WITH myData AS (
  select 
    id, 
    values
    FROM 
      UNNEST([
        STRUCT<id int, values string>(1, 'abc,pqr,lmn,omr'),
        STRUCT<id int, values string>(2, 'aaa,bbb,ccc,ddd')
      ])
)
SELECT 
  id,
  data
FROM myData,
UNNEST(SPLIT(values, ',')) AS data
ORDER BY id ASC

I start by just creating a sample temporary table with your test data. The gist of what you're looking for is there at the end. You're on the right track using UNNEST to break out the elements of the array out to separate rows, but you were missing the SPLIT function to actually break apart the individual values in the string by their comma delimiters.

This query yields the following:

id	data
1	abc
1	pqr
1	lmn
1	omr
2	aaa
2	bbb
2	ccc
2	ddd

And for just a tweak on that - if you simply want a monotonically increasing number as the ID instead of it repeating as it does there (e.g. no need to link back to an ID), update your query to the following:

WITH myData AS (
  select 
    id, 
    values
    FROM 
      UNNEST([
        STRUCT<id int, values string>(1, 'abc,pqr,lmn,omr'),
        STRUCT<id int, values string>(2, 'aaa,bbb,ccc,ddd')
      ])
)
SELECT ROW_NUMBER() OVER() AS id,
  data
FROM myData,
UNNEST(SPLIT(values, ',')) AS data
ORDER BY id ASC

Here I simply dropped use of the id field in the SELECT expression and replaced with an incrementing row number that increments over all the records in the data set. This yields the following:

id	data
1	abc
2	pqr
3	lmn
4	omr
5	aaa
6	bbb
7	ccc
8	ddd