I want to extract page and page number from the URL with regex. There are couple of variations of page number:
fghghdsfs/page4
fghghdsfs/page-4
sfgsfgsfg/page=4
hteheth/page-4/
dhdghgd/page=4/
dghdghdh/page/4/
dghdghdh/page/4
fghghdsfs?page4
dhdghd?page-4
dghdg?page-4/
eyeyt?page=4
etyetyet?page=4/
nvnndgnd?page/4/
dghdghdh/page/4
Number of page should have between 1 and 3 digits.
I have tried with this regex, but I have a problem with identifying /
:
(=|\?|\/)(page)(_|-|=|\d{1,3}|\/)
CodePudding user response:
There are two problems with the regex you have:
\d{1,3}
is inside the parentheses. You're saying:page
followed by either a separator or by the page number. Put it after the parentheses, and make it a capture group so you can extract it later.- The group with separators is required, so
page4
does not match. Put a?
after the group.
Fixing those:
(=|\?|\/)(page)(_|-|=|\/)?(\d{1,3})
CodePudding user response:
You may use this regex:
[=?/]page[_=/-]?(\d{1,3})
RegEx Details:
[=?/]
: Match=
or?
or/
page
: Match stringpage
[_=/-]?
: Optionally match_
or=
or/
or-
(\d{1,3})
: Match 1 to 3 digits