Snowflake Regular Expression-CodePudding

I have this string in Snowflake column:

\[
{
"entryListId": 3279,
"id": 4617,
"name": "SpecTra",
"type": 0
},
{
"entryListId": 3279,
"id": 7455,
"name": "Signal Capital Partners",
"type": 0
}
\]

I need to get names in this format regardless of the number of company names: "SpecTra, Signal Capital Partners". In other words, I need to extract company names and concatenate them.

I have tried this :

regexp_replace(col, '"(\[^"\] )"|.', '\\1|')

and regexp_substr() function, but did not get the desired output

Can you please help me with this? Thanks

CodePudding user response：

You can use

trim(regexp_replace(regexp_replace(col, '"name":\\s*"([^"] )"|.', '\\1,'), ', ', ','), ',')

Details:

"name":\s*"([^"] )"|. regex matches "name":, then zero or more whitespaces, and a ", and then captures into Group 1 any one or more chars other than " and then matches a " char, and replaces with Group 1 and a comma
The second regexp_replace shrinks all commas into a single occurrence of a comma, , matches one or more commas (you may also use a more specific ,{2,} pattern here instead)
trim removes commas from start and end.

CodePudding user response：

So pushing you text blob into a CTE.

with data as (
    SELECT * FROM VALUES
    ('[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]')
    t(str)
)

I cannot help but note it's JSON, so lets PARSE_JSON that and then FLATTEN it, and here are you "names"

select 
    d.*
    ,f.value:name::text as name
from data d
    ,table(flatten(input=>parse_json(d.str))) f

giving:

STR	NAME
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]	SpecTra
[{"entryListId": 3279,"id": 4617,"name": "SpecTra","type": 0},{"entryListId": 3279,"id": 7455,"name": "Signal Capital Partners","type": 0}]	Signal Capital Partners