Home > database >  Extract number from a URL in Redshift
Extract number from a URL in Redshift

Time:11-25

I would like to extract an ID (a number) from a bunch of URLs in Redshift. I know I can use regexp_substr for this purpose, but my knowledge of regular expressions is weak. Here are a couple example URLs:

/checkout?feature=ADVANCED_SEARCH&upgradeRedirect=%2Fmentions%3Ftop_ids%3D1222874068&btv=feature_ADVANCED_SEARCH
/checkout?feature=ADVANCED_SEARCH&trigger=mentioning-author-rw&upgradeRedirect=%2Fmentions%3Ftop_ids%3D160447990

After parsing the above URLs, I would like the output to be:

1222874068
160447990

Note that the parameter top_ids remains constant and will help break the URL. I tried using multiple versions of split_part as well. But there may be variations in the URL where it might break. So using a regular expression may be a better idea.

Any help would be greatly appreciated.

CodePudding user response:

You can use:

select regexp_substr(column,'top_ids=([0-9]*)', 1, 1, 'e')

The 'e' extracts the substring in (brackets).

CodePudding user response:

Try something like this:

SUBSTR(REGEXP_SUBSTR(yourcolumn, 'top_ids=([0-9]{2,})'), 11, 20)

Just looking for 'top_ids=' and 2 or more digits after it.
Then removes the first 10 characters.

  • Related