I am trying to extract the sub-string of Twitter and Fortnite, Facebook and Words with Friends, & Expedia and Notion from the strings below. These are just examples and the common pattern amongst all of them is "AppNames";s:xx:" with xx being a different number for each string. I'm having a difficult time accounting for the different numbers. Here is my current REGEX_SUBSTR query that doesn't take into account the varying numerical value in s:xx
REGEXP_SUBSTR(FORM_FIELDS, '"AppNames";s:17:\"([^\"] )', 1, 1, 'e')
Just to reiterate, the above query will output Twitter and Fortnite, which is correct for the first string but I have many other strings that are similar but have a different numerical patterns such as...
- "AppNames";s:35
- "AppNames";s:50
- "AppNames";s:44
STRINGS
s:25:"Social/Games";s:14:"AppNames";s:17:"Twitter and Fortnite";s:12:"Audience";s:20;} s:25:"Social/Games";s:14:"AppNames";s:35:"Facebook and Words with Friends";s:12:"Audience";s:20;} s:15:"Travel/Productivity";s:19:"AppNames";s:50:"Expedia and Notion";s:12:"Audience";s:20;}
CodePudding user response:
with data(FORM_FIELDS) as (
SELECT * from values
('s:25:"Social/Games";s:14:"AppNames";s:17:"Twitter and Fortnite";s:12:"Audience";s:20;}'),
('s:25:"Social/Games";s:14:"AppNames";s:35:"Facebook and Words with Friends";s:12:"Audience";s:20;}'),
('s:15:"Travel/Productivity";s:19:"AppNames";s:50:"Expedia and Notion";s:12:"Audience";s:20;}')
)
select
REGEXP_SUBSTR(FORM_FIELDS, '"AppNames";s:[0-9] :\"([^\"] )', 1, 1, 'e') as new
,REGEXP_SUBSTR(FORM_FIELDS, '"AppNames";s:17:\"([^\"] )', 1, 1, 'e') as old
from data;
gives:
NEW | OLD |
---|---|
Twitter and Fortnite | Twitter and Fortnite |
Facebook and Words with Friends | null |
Expedia and Notion | null |
so [0-9]
will match 1 digits, you can also use \\d
but if you only want 2 \\d{2}
will force that.