Home > other >  How to get only digits between two strings with regex?
How to get only digits between two strings with regex?

Time:09-08

We have urls in the following URL formats, I want to get only digit values between the strings I specified, I tried a pattern like this (?<=\/sub.example.com\/)(.*)(?=\?[Uu]rl|$) but it does not give the result I want

https://sub.example.com/79084/t/64931?Url=https://www.test.com/path/otherpath/
https://sub.example.com/79084/t/64931

Expected results: [ 79084, 64931 ]

I need to exclude /t/

https://regexr.com/6ti8p

CodePudding user response:

Given the sample URLs in the question it should be sufficient to simply match digits preceded by a slash:

(?<=/)\d 

Demo: https://regexr.com/6tia6

CodePudding user response:

Using dynamic length lookbehind feature in Javascript, you can use this regex:

(?<=\/sub\.example\.com\/(?:[^\/]*\/)*)\d (?=(?:\/[^\/]*)*(?:\?[Uu]rl|$))

RegEx Demo

Note that it will match all the digits after domain name e.g. https://sub.example.com/79084/t/64931/1234/6789 will have 4 matches for all the numbers.

RegEx Breakup:

  • (?<=\/sub\.example\.com\/(?:[^\/]*\/)*): Lookbehind to assert presence of sub.example.com/ followed by 0 or more repeats of path components separated with /
  • \d : Match 1 digits
  • (?=(?:\/[^\/]*)*(?:\?[Uu]rl|$)): Must be followed by 0 or more repeats of path components separated with / and that must be followed by ?Url or line end.

CodePudding user response:

If all your Urls have this same format with digits /anything/ digits, then you can change your .* to be more specific:

(?<=\/sub.example.com\/)(\d )\/(.*)\/(\d )(?=\?[Uu]rl|$)

So changing it to (\d )\/(.*)\/(\d ) allows you to get each set of digits as a matched group.

https://regex101.com/r/dv7BEv/2

  • Related