Home > Software design >  the right regex for catching a part of url
the right regex for catching a part of url

Time:12-07

There are some cases of URLs like below.

(1) https://m.aaa.kr/category/outer/55/
(2) https://m.aaa.kr/category/inner/5/
(3) https://m.aaa.kr/product/jacket/3031/category/55/display/1/
(4) https://m.aaa.kr/product/shirts/30/category/5/display/1/

I need the right regex for catching the "55" or "5" part of those URLs.

What I tried was /(?:\/category\/\w )(\/category\/)|(\d [^\/])/g

However, this regex also catches the "3031" in case (3), "30" in case (4). And it cannot catch "5" in cases (2) and (4).

How can I fix it to do the right?

CodePudding user response:

How about catching the first digit (or digits) directly after /category/ or /category/someothertext/

with: /\/category\/(\w \/)?(\d )/g

You can test it online here: https://regex101.com/r/n4dj1r/1

CodePudding user response:

Note that your /(?:\/category\/\w )(\/category\/)|(\d [^\/])/g regex match multiple occurrences (due to g flag) of the pattern that matches either /category/, then one or more word chars, and then /category/ (captured into Group 1) or captures into Group 2 one or more digits and then one char other than a /. This is definitely a wrong pattern, as you only want to match and capture digits in Group 2. Also, the first alternative does not seem to match anything meaningful for you at all, as it does not restrict the second alternative.

Also, using \w to match any text between two slashes is not usually efficient as the URL parts often contain - chars, that are not word chars.

So, what you can use is one of

/\/category\/(?:[\w-] \/)?(\d )/
/\/category\/(?:[^\/] \/)?(\d )/

Note there is no g flag since all you need is the first match. Details:

  • \/category\/ - a /category/ string
  • (?:[\w-] \/)? - an optional sequence of one or more word or hyphen chars and then a / (note [^\/] matches any one or more chars other than /, and also a non-capturing group that helps keep the match object structure simpler)
  • (\d ) - Group 1: one or more digits.

See the JavaScript demo:

const urls = ['https://m.aaa.kr/category/outer/55/','https://m.aaa.kr/category/inner/5/','https://m.aaa.kr/product/jacket/3031/category/55/display/1/','https://m.aaa.kr/product/shirts/30/category/5/display/1/']
const rx = /\/category\/(?:[\w-] \/)?(\d )/;
for (const url of urls) {
    document.body.innerHTML  = '"'   url   '" => "<b>'   (rx.exec(url) || ['',''])[1]   '</b>"<br/>';
}
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related