We have urls in the following URL formats, I want to get only digit values between the strings I specified, I tried a pattern like this (?<=\/sub.example.com\/)(.*)(?=\?[Uu]rl|$)
but it does not give the result I want
https://sub.example.com/79084/t/64931?Url=https://www.test.com/path/otherpath/
https://sub.example.com/79084/t/64931
Expected results:
[ 79084, 64931 ]
I need to exclude /t/
CodePudding user response:
Given the sample URLs in the question it should be sufficient to simply match digits preceded by a slash:
(?<=/)\d
Demo: https://regexr.com/6tia6
CodePudding user response:
Using dynamic length lookbehind feature in Javascript, you can use this regex:
(?<=\/sub\.example\.com\/(?:[^\/]*\/)*)\d (?=(?:\/[^\/]*)*(?:\?[Uu]rl|$))
Note that it will match all the digits after domain name e.g. https://sub.example.com/79084/t/64931/1234/6789
will have 4 matches for all the numbers.
RegEx Breakup:
(?<=\/sub\.example\.com\/(?:[^\/]*\/)*)
: Lookbehind to assert presence ofsub.example.com/
followed by 0 or more repeats of path components separated with/
\d
: Match 1 digits(?=(?:\/[^\/]*)*(?:\?[Uu]rl|$))
: Must be followed by 0 or more repeats of path components separated with/
and that must be followed by?Url
or line end.
CodePudding user response:
If all your Urls have this same format with digits /anything/ digits
, then you can change your .*
to be more specific:
(?<=\/sub.example.com\/)(\d )\/(.*)\/(\d )(?=\?[Uu]rl|$)
So changing it to (\d )\/(.*)\/(\d )
allows you to get each set of digits as a matched group.