Working with some unstructured data and hoping to extract utm_campaign names from URLs. Here's an example of one URL. I'd like to extract the name following "utm_campaign=" but before "&utm_source" each time. How can I do this with regex?
https://data.io/?utm_campaign=branded&utm_source=google&utm_medium=cpc&utm_term=data&gclid=CjwKCAjwiuuRBhBvEiwAFXKaNGwDI2MaPLQpb9duvQZ2XeOA4vpk7FuBTqhYiJ2ysnirdUh8SZ6-zxoC4VYQAvD_BwE
I've tried various regex functions, for instance:
regexp_substr(get_path(event_properties, 'url'), '(\[A-Za-z\_-\]\*)-utm_campaign')
CodePudding user response:
Easier done with parse_url
select parse_url(your_url):parameters:"utm_campaign"::string
Or you can divide and conquer using split_part
select split_part(split_part(your_url,'utm_campaign=',-1),'&',1)
CodePudding user response:
Try this one:
SELECT regexp_substr(get_path(event_properties, 'url'), '\\b [-utm_campaign=](\\w )&\\b ', 1, 1, 'e', 1)
Reference: REGEXP_SUBSTR