I have a following data and I want to split the each row by delimiter into new rows.
Sample Data:
**Row No. | Data**
1 | abc,pqr,lmn,omr
2 | aaa,bbb,ccc,ddd
And the output should be:
**Row No. | Data**
1 | abc
2 | pqr
3 | lmn
4 | omr
5 | aaa
6 | bbb
6 | ccc
6 | ddd
I have a following code snippet but it only works for the first row.
SELECT * EXCEPT(c) REPLACE(c AS col)
FROM `project.dataset.table`,
UNNEST(SPLIT(col)) c;
CodePudding user response:
Use the following:
WITH myData AS (
select
id,
values
FROM
UNNEST([
STRUCT<id int, values string>(1, 'abc,pqr,lmn,omr'),
STRUCT<id int, values string>(2, 'aaa,bbb,ccc,ddd')
])
)
SELECT
id,
data
FROM myData,
UNNEST(SPLIT(values, ',')) AS data
ORDER BY id ASC
I start by just creating a sample temporary table with your test data. The gist of what you're looking for is there at the end. You're on the right track using UNNEST
to break out the elements of the array out to separate rows, but you were missing the SPLIT
function to actually break apart the individual values in the string by their comma delimiters.
This query yields the following:
id | data |
---|---|
1 | abc |
1 | pqr |
1 | lmn |
1 | omr |
2 | aaa |
2 | bbb |
2 | ccc |
2 | ddd |
And for just a tweak on that - if you simply want a monotonically increasing number as the ID instead of it repeating as it does there (e.g. no need to link back to an ID), update your query to the following:
WITH myData AS (
select
id,
values
FROM
UNNEST([
STRUCT<id int, values string>(1, 'abc,pqr,lmn,omr'),
STRUCT<id int, values string>(2, 'aaa,bbb,ccc,ddd')
])
)
SELECT ROW_NUMBER() OVER() AS id,
data
FROM myData,
UNNEST(SPLIT(values, ',')) AS data
ORDER BY id ASC
Here I simply dropped use of the id
field in the SELECT expression and replaced with an incrementing row number that increments over all the records in the data set. This yields the following:
id | data |
---|---|
1 | abc |
2 | pqr |
3 | lmn |
4 | omr |
5 | aaa |
6 | bbb |
7 | ccc |
8 | ddd |