Home > database >  Regex for endpoint
Regex for endpoint

Time:12-03

I need to write a regex that takes certain endpoints. Below an example:

INPUT:

https://www.pippo.com/tt-tt/vision-guide/kids-vision-eyecare-101
https://www.pippo.com/tt-tt/vision-guide/lenses-205
https://www.pippo.com/tt-tt/vision-guide/kids-eye-exam
https://www.pippo.com/CategoryLanding?storeId=10851&urlRequestType=Base&categoryId=99930171&langId=-1&catalogId=11651
https://www.pippo.com/ee-ee/send-email
https://www.pippo.com/ff-ff/ray-ban/8053672153743
https://www.pippo.com/as-us/vision-guide/progressives

OUTPUT:

https://www.pippo.com/tt-tt/vision-guide/kids-vision-eyecare-101
https://www.pippo.com/tt-tt/vision-guide/lenses-205
https://www.pippo.com/tt-tt/vision-guide/kids-eye-exam
https://www.pippo.com/ee-ee/send-email
https://www.pippo.com/as-us/vision-guide/progressives

I wrote the following regex ^. [^-][^0-9] $ but it doesn't work well.

Can you help me?

Thank you very much.

CodePudding user response:

In your pattern you use 2 negated character classes at the end [^-][^0-9] meaning not - and not a digit.

But looking at the expected result, if you want to match all urls that do not end with / or = followed by only digits.

^https?://\S*[/=](?!\d $)[^/=] $

Regex demo

CodePudding user response:

Use

^(?!.*/\d $)[^?\n] $

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    /                        '/'
--------------------------------------------------------------------------------
    \d                       digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [^?\n]                   any character except: '?', '\n' (newline)
                           (1 or more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
  • Related