I would like to extract an ID (a number) from a bunch of URLs in Redshift. I know I can use regexp_substr
for this purpose, but my knowledge of regular expressions is weak. Here are a couple example URLs:
/checkout?feature=ADVANCED_SEARCH&upgradeRedirect=%2Fmentions%3Ftop_ids%3D1222874068&btv=feature_ADVANCED_SEARCH
/checkout?feature=ADVANCED_SEARCH&trigger=mentioning-author-rw&upgradeRedirect=%2Fmentions%3Ftop_ids%3D160447990
After parsing the above URLs, I would like the output to be:
1222874068
160447990
Note that the parameter top_ids
remains constant and will help break the URL.
I tried using multiple versions of split_part
as well. But there may be variations in the URL where it might break. So using a regular expression may be a better idea.
Any help would be greatly appreciated.
CodePudding user response:
You can use:
select regexp_substr(column,'top_ids=([0-9]*)', 1, 1, 'e')
The 'e'
extracts the substring in (brackets).
CodePudding user response:
Try something like this:
SUBSTR(REGEXP_SUBSTR(yourcolumn, 'top_ids=([0-9]{2,})'), 11, 20)
Just looking for 'top_ids=' and 2 or more digits after it.
Then removes the first 10 characters.