Home > Enterprise >  How to extract a substring of a URL with regex
How to extract a substring of a URL with regex

Time:03-25

Working with some unstructured data and hoping to extract utm_campaign names from URLs. Here's an example of one URL. I'd like to extract the name following "utm_campaign=" but before "&utm_source" each time. How can I do this with regex?

https://data.io/?utm_campaign=branded&utm_source=google&utm_medium=cpc&utm_term=data&gclid=CjwKCAjwiuuRBhBvEiwAFXKaNGwDI2MaPLQpb9duvQZ2XeOA4vpk7FuBTqhYiJ2ysnirdUh8SZ6-zxoC4VYQAvD_BwE

I've tried various regex functions, for instance:

regexp_substr(get_path(event_properties, 'url'), '(\[A-Za-z\_-\]\*)-utm_campaign')

CodePudding user response:

Easier done with parse_url

select parse_url(your_url):parameters:"utm_campaign"::string

Or you can divide and conquer using split_part

select split_part(split_part(your_url,'utm_campaign=',-1),'&',1)

CodePudding user response:

Try this one:

SELECT regexp_substr(get_path(event_properties, 'url'), '\\b [-utm_campaign=](\\w )&\\b ', 1, 1, 'e', 1)

Reference: REGEXP_SUBSTR

  • Related