Home > OS >  Regex part of URL string
Regex part of URL string

Time:02-24

I have the following strings

https://test.com/fi/wp-content
https://test.com/fr
https://test.com/es
https://test.com/
https://test.com/wp-content/
https://test.com/image.png
https://test.com/de/wp-content/themes
https://test.com/es
https://test.com/fr
https://test.com/no
https://test.com/da
https://test.com/en
https://test.com/de
https://test.com/nl/wp-content
https://test.com/fi

As far as now I have the following regex

/\btest.com.*\.*(?<!fr|es|da|no|en|de|nl|fi)$/gm

I want to match the following (image 1) enter image description here

Im almost there but my regex matches everything after my expression like this (image 2): enter image description here

I can seem to figure out how to get the end of my regex to behave so it produces the match as image 1. Here is a regex101: https://regex101.com/r/Tv0AjJ/1

CodePudding user response:

Currently this part .*(?<!fr|es|da|no|en|de|nl|fi)$ matches until the end of the string and asserts what is to the left is not any of the alternatives, that is why /es does not match but .png does.

You can match the / and then assert not any of the alternatives directly to the right using a negative lookahead (?!

Note to escape the the dot.

\btest\.com\/(?!fr|es|d[ae]|no|en|nl|fi)

Regex demo

If you don't want partial matches, you can either group the alternatives themselves again follwed by a word boundary \b or as Wiktor Stribiżew mentioned in the comments a forward slash or the end of the string (?:\/|$)

Alternatives that have the same character can be grouped together in a character class d[ae]

\btest\.com\/(?!(?:fr|es|d[ae]|no|en|nl|fi)\b)
  • Related