Home > Enterprise >  Finding Substring with Different Ending
Finding Substring with Different Ending

Time:06-04

I'm trying to find the substring of a list of URLs but having trouble due to an extra character (forward slash) at the end of one of the URL. Below are two examples of the data:

  • enter image description here


    EDIT:

    How would you get just "sub_test" for the last example you have there?

    SPLIT_PART with negative index could be used:

    WITH cte AS (
      SELECT CONCAT('http://', COLUMN1) AS URL
      FROM VALUES   ('www.url1.com/test'),
                    ('www.url1.com/test2/'), 
                    ('www.url1.com/test/sub_test/')
    )
    SELECT URL, 
           PARSE_URL(URL) AS parts,
           PARSE_URL(URL):path::text AS path,
           SPLIT_PART(TRIM(PARSE_URL(URL):path::text, '/'), '/', -1) AS path3
    FROM cte;
    

    Output:

    enter image description here

    CodePudding user response:

    You can just trim off the final slash like this:

    select REGEXP_SUBSTR(trim(URL, '/'), '/([^/] )$', 1, 1, 'e', 1) as URL_EXTRACT
    

    Parsing the URL to a JSON as Lukasz showed will cover many more possibilities, but this will be faster for large data sets if this is the only problem to correct.

    CodePudding user response:

    You can use Positive Lookahead for this task:

    [^\/\n] (?=\/?$)
    

    Explanation:

    • [^\/\n] : any combination of characters other than slash and new line (your match)
    • (?=\/?$): positive lookahead (checks if after your match there's)
      • \/?: optional slash
      • $: end string

    Try it here.

  • Related