Let's say I have a table like the one below
| Header 1 | Header 2 | Header 3
--------------------------------------------------------------------------------------
| id1 | detail1 | <[email protected]> , <[email protected]> , <[email protected]> , <[email protected]>
How do i explode it on SQL based on the substring emails inside the angle brackets such that it looks like the one below.
| Header 1 | Header 2 | Header 3. |
-------------------------------------------
| id1 | detail1 | [email protected] |
| id1 | detail1 | [email protected] |
| id1 | detail1 | [email protected] |
| id1 | detail1 | [email protected] |
CodePudding user response:
Using regexp_extract_all
and explode
should do.
select `Header 1`, `Header 2`, explode(regexp_extract_all(`Header 3`, '<(. ?)>')) as `Header 3` from table
this should get you
-------- -------- ----------
|Header 1|Header 2|Header 3 |
-------- -------- ----------
|id1 |detail1 |[email protected]|
|id1 |detail1 |[email protected]|
|id1 |detail1 |[email protected]|
|id1 |detail1 |[email protected]|
-------- -------- ----------
Be aware that regexp_extract_all
was added to spark since version 3.1.0
.
For spark blow 3.1.0
This can be done with split
, somewhat a dirty hack. But the strategy and the results are the same.
select `Header 1`, `Header 2`, explode(array_remove(split(`Header 3`, '[<>,\\s] '), '')) as `Header 3` from table
What this do is to regex match the delimiters and split the string into array. It also needs an array_remove
function call to remove unneeded empty string.
Explanation
With regexp_extract_all
, we use the pattern <(. ?)>
to extract all strings within angle brackets, into an array like this
['[email protected]', '[email protected]', '[email protected]']
For the pattern (. ?)
here
.
matches 1 character;.
, looking for 1 or unlimited matches;?
is a non greedy modifier, makes the match stop as soon as possible;- brackets makes the pattern with in angle brackets as a matching group, so we can extract from groups later;
Now with explode
, we can separate elements of the array into multiple rows, hence the result above.