Home > Back-end >  How to explode substrings inside a string in a column in SQL
How to explode substrings inside a string in a column in SQL

Time:06-23

Let's say I have a table like the one below

| Header 1  | Header 2  | Header 3       
--------------------------------------------------------------------------------------                                                
| id1       | detail1   | <[email protected]> , <[email protected]> , <[email protected]>  , <[email protected]>    

How do i explode it on SQL based on the substring emails inside the angle brackets such that it looks like the one below.

| Header 1  | Header 2   | Header 3.   |
-------------------------------------------
| id1       | detail1    | [email protected]  |
| id1       | detail1    | [email protected]  |
| id1       | detail1    | [email protected]  |
| id1       | detail1    | [email protected]  |

CodePudding user response:

Using regexp_extract_all and explode should do.

select `Header 1`, `Header 2`, explode(regexp_extract_all(`Header 3`, '<(. ?)>')) as `Header 3` from table

this should get you

 -------- -------- ---------- 
|Header 1|Header 2|Header 3  |
 -------- -------- ---------- 
|id1     |detail1 |[email protected]|
|id1     |detail1 |[email protected]|
|id1     |detail1 |[email protected]|
|id1     |detail1 |[email protected]|
 -------- -------- ---------- 

Be aware that regexp_extract_all was added to spark since version 3.1.0.


For spark blow 3.1.0

This can be done with split, somewhat a dirty hack. But the strategy and the results are the same.

select `Header 1`, `Header 2`, explode(array_remove(split(`Header 3`, '[<>,\\s] '), '')) as `Header 3` from table

What this do is to regex match the delimiters and split the string into array. It also needs an array_remove function call to remove unneeded empty string.


Explanation

With regexp_extract_all, we use the pattern <(. ?)> to extract all strings within angle brackets, into an array like this

['[email protected]', '[email protected]', '[email protected]']

For the pattern (. ?)here

  • . matches 1 character;
  • is a quantifier of ., looking for 1 or unlimited matches;
  • ? is a non greedy modifier, makes the match stop as soon as possible;
  • brackets makes the pattern with in angle brackets as a matching group, so we can extract from groups later;

Now with explode, we can separate elements of the array into multiple rows, hence the result above.

  • Related