I have a url structure where the first subdirectory is the region and then the second optional one is the language overide:
https://example.com/no/en
I'm trying to get the two parts out in a group each. This way, in the JS, I can do the following to get each part of the url:
const pathname = window.location.pathname // '/no/en/mypage'
const match = pathname.match('xxx')
const region = match[1] // 'no' or '/no'
const language = match[2] // 'en' or '/en'
I have tried creating multiple regexes with no luck in nailing all of my requirements below:
This is the closest I have come, but it is prune to error due to also matching "/do" from /donotmatch
with the following regex:
(\/[a-z]{2})(\/[a-z]{2})?
The problem with this one is that it's also matching cases like /noada
.
I then tried to match first two a-z and then followed by either a forward slash or no characters like this: (\/[a-z]{2}\/|[^.])([a-z]{2}\/|[^.])?
I think I am not getting the syntax correct for the not part.
The regex I am trying to create has to pass these criterias in order not to break:
- /no - group 1 match(no), group 2 undefined
- /no/ - group 1 match(no), group 2 undefined
- /nona - no matches
- /no/en - group 1 match(no), group 2 match(en)
- /no/en/ - group 1 match(no), group 2 match(en)
- /no/enen - group 1 match(no), group 2 undefined
- /no/en/something - group 1 match(no), group 2 match(en)
- /no/en/jp - group 1 match(no), group 2 match(en) (jp is not going to be matched)
I feel I am really close to a working solution, but all my tries so far have been off in a slight way.
If the group part is not possible, I suppose also getting /xx/xx and then splitting by / is also an option.
CodePudding user response:
You may use this regex with an optional 2nd capture group:
\/(\w{2})(?:\/(\w{2}))?(?:\/|$)
RegEx Explanation:
\/
: Match starting/
(\w{2})
: First capture group to match 2 word characters(?:\/(\w{2}))?
: Optional non-capture group that starts with a/
followed by seconf capture group to match 2 word characters.(?:\/|$)
: Match closing/
or end of line
CodePudding user response:
Follow each capture with (?=$|/)
, which is a look ahead to assert that what comes next is either end of input or a slash.
https?://[^/] /(\w\w)(?=$|/)(?:/(\w\w)(?=$|/))?
See live demo.
The second capture is wrapped in an optional non-capture group via (?:…)?
To be more strict to allow only letters, replace \w
with [a-z]
but \w
may be enough for your needs.