I'm trying to find the substring of a list of URLs but having trouble due to an extra character (forward slash) at the end of one of the URL. Below are two examples of the data:
-
EDIT:
How would you get just "sub_test" for the last example you have there?
SPLIT_PART
with negative index could be used:WITH cte AS ( SELECT CONCAT('http://', COLUMN1) AS URL FROM VALUES ('www.url1.com/test'), ('www.url1.com/test2/'), ('www.url1.com/test/sub_test/') ) SELECT URL, PARSE_URL(URL) AS parts, PARSE_URL(URL):path::text AS path, SPLIT_PART(TRIM(PARSE_URL(URL):path::text, '/'), '/', -1) AS path3 FROM cte;
Output:
CodePudding user response:
You can just trim off the final slash like this:
select REGEXP_SUBSTR(trim(URL, '/'), '/([^/] )$', 1, 1, 'e', 1) as URL_EXTRACT
Parsing the URL to a JSON as Lukasz showed will cover many more possibilities, but this will be faster for large data sets if this is the only problem to correct.
CodePudding user response:
You can use Positive Lookahead for this task:
[^\/\n] (?=\/?$)
Explanation:
[^\/\n]
: any combination of characters other than slash and new line (your match)(?=\/?$)
: positive lookahead (checks if after your match there's)\/?
: optional slash$
: end string
Try it here.