Home > database >  Match URL string on different situations with a single regex
Match URL string on different situations with a single regex

Time:06-09

I am trying to match a url in four different situations:

With no attributes

<a href="example.com/reviews/audi/a6/">Link without other attr</a>

With other attributes

<a href="example.com/reviews/audi/a6/" >Link with other attr</a>

With no standard href

<span data-link="example.com/reviews/audi/a6/">Link with no href</a>

Just the URL

example.com/reviews/audi/a6

In all of them I always want to do the same, swap reviews at the end without an extra /

I am using this regex to account for the ones that have another attr by identifing the space after the "

("example\.com)\/(reviews|used-cars)\/(.*[^\/$])(\/?)(" )

But then if it ends in "> it messes up and matches end of class

("example\.com)\/(reviews|used-cars)\/(.*[^\/$])(\/?)(">)

https://regex101.com/r/9xbdme/1

CodePudding user response:

You can use

Find:       ("?example\.com)/(reviews|used-cars)/([^"\s]*[^/"\s])/?("[\s>])?
Replace: $1/$3/$2/$4

See the regex demo.

Details:

  • ("?example\.com) - Group 1: an optional ", example.com string
  • / - a slash
  • (reviews|used-cars) - Group 2: reviews or used-cars string
  • / - a slash
  • ([^"\s]*[^/"\s]) - Group 3: zero or more chars other than whitespace and " (as many as possible) and then a char other than a whitespace, " and /
  • /? - an optional slash
  • ("[\s>])? - Group 4 (optional): a " and then a > or whitespace.
  • Related