Home > OS >  Regular expression to exactly match the last path segment of an URL without parameters, except if th
Regular expression to exactly match the last path segment of an URL without parameters, except if th

Time:09-30

The goal of my regular expression adventure is to create a matcher for a mechanism that could add a trailing slash to URLs, even in the presence of parameters denoted by # or ? at the end of the URL.

For any of the following URLs, I'm looking for a match for segment as follows:

  1. https://example.com/what-not/segment matches segment
  2. https://example.com/what-not/segment?a=b matches segment
  3. https://example.com/what-not/segment#a matches segment

In case there is a match for segment, I'm going to replace it with segment/.

For any of the following URLs, there should be no match:

  1. https://example.com/what-not/segment/ no match
  2. https://example.com/what-not/segment/?a=b no match
  3. https://example.com/what-not/segment/#a no match

because here, there is already a trailing slash.

I've tried:

  1. This primitive regex and their variants: .*\/([^?#\/] ). However, with this approach, I could not make it not match when there is already a trailing slash.
  2. I experimented with negative lookaheads as follows: ([^\/\#\?] )(?!(.*[\#\?].*))$. In this case, I could not get rid of any ? or # parts properly.

Thank you for your kind help!

CodePudding user response:

Lookahead and lookbehind conditionals are so powerful!

(?<=\/)[\w] (?(?=[\?\#])|$)

P.s: I just added [\w] that means [a-zA-Z0-9_] .
Of course URLs can contain many other character like - or ~ but for the examples provided it works nicely.

  • Related