have the following string (it's a salesforce query, but not important):
IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")
and I want to get an array of all substrings that are encapsulated between double quotes like so:
['yahoo',
'gmail',
'protonmail.com,att.net,chpmail.com,smail.com',
'Free Mail',
'Business Email']
in python I do:
re.findall(r'"(. ?)"', <my string>)
but is there a way to replicate this in Snowflake?
I've tried
SELECT
REGEXP_SUBSTR('IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")', '"(. ?)"') as emails;
but I get this:
"yahoo"),CONTAINS(EmailDomain,"gmail"
CodePudding user response:
You can use
select split(trim(regexp_replace(regexp_replace(col, '"([^"] )"|.', '\\1|'),'\\| ','|'), '|'), '|');
Details:
regexp_replace(col, '"([^"] )"|.', '\\1|')
- finds any strings between the closest double quotes while capturing the part inside quotes into Group 1, or matching any single char and replaces each match with Group 1 contents|
char (see the regex demo)regexp_replace(...,'\\| ','|')
- this shrinks all consecutive pipe symbols into a single occurrence of a|
char (see this regex demo)trim(..., '|')
- removes|
chars on both ends of the stringsplit(..., '|')
- splits the string with a|
char.
CodePudding user response:
Wiktor's answer works great. I'm adding an alternate answer for anyone who needs to do this and their quoted strings may contain the pipe |
character. Using the replacement method on strings containing pipe(s) will split the string into more than one array member. Here's a way (not the only way) to do it that will work in case the quoted strings could potentially contain pipe characters:
set col = $$IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),"Free Mail","Business Email | Other")$$;
create or replace function GET_QUOTED_STRINGS("s" string)
returns array
language javascript
strict immutable
as
$$
var re = /(["'])(?:\\.|[^\\])*?\1/g;
var m;
var out = [];
do {
m = re.exec(s);
if (m) {
out.push(m[0].replace(/['"] /g, ''));
}
} while (m);
return out;
$$;
select get_quoted_strings($col);