I need to write a regex that takes certain endpoints. Below an example:
INPUT:
https://www.pippo.com/tt-tt/vision-guide/kids-vision-eyecare-101
https://www.pippo.com/tt-tt/vision-guide/lenses-205
https://www.pippo.com/tt-tt/vision-guide/kids-eye-exam
https://www.pippo.com/CategoryLanding?storeId=10851&urlRequestType=Base&categoryId=99930171&langId=-1&catalogId=11651
https://www.pippo.com/ee-ee/send-email
https://www.pippo.com/ff-ff/ray-ban/8053672153743
https://www.pippo.com/as-us/vision-guide/progressives
OUTPUT:
https://www.pippo.com/tt-tt/vision-guide/kids-vision-eyecare-101
https://www.pippo.com/tt-tt/vision-guide/lenses-205
https://www.pippo.com/tt-tt/vision-guide/kids-eye-exam
https://www.pippo.com/ee-ee/send-email
https://www.pippo.com/as-us/vision-guide/progressives
I wrote the following regex ^. [^-][^0-9] $
but it doesn't work well.
Can you help me?
Thank you very much.
CodePudding user response:
In your pattern you use 2 negated character classes at the end [^-][^0-9]
meaning not -
and not a digit.
But looking at the expected result, if you want to match all urls that do not end with /
or =
followed by only digits.
^https?://\S*[/=](?!\d $)[^/=] $
CodePudding user response:
Use
^(?!.*/\d $)[^?\n] $
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\d digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^?\n] any character except: '?', '\n' (newline)
(1 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string