I'm trying to split any URL that would show up on my website into three parts:
- Language (optional)
- Hierarchical structure of the page (parents)
- Current page
Right now I operate with 1 and 3 but I need to develop a way to allow for the pages to have the same names if they have different parents and therefore full URL is unique.
Here are the types of URL I may have:
(nothing)
en
en/test
en/parent/test
test
parent/test
ggparent/gparent/parent/test
I thought about extending my current directive:
RewriteRule ^(?:([a-z]{2})(?=\/))?.*(?:\/([\w\-\,\ ] ))$ /index.php?lang=$1&page=$2 [L,NC]
to the following:
(?:([a-z]{2})(?=\/))?(.*)\/([^\/]*)?$
Which then I could translate to index.php?lang=$1&tree=$2&page=$3
but the difficulty I have is that the second capturing group captures the slash from the beginning.
I believe I can't (based on my search so far) dynamically have all the strings between slashes to be returned and make the last one to always be first, without repeating the same regex. I thought I would capture anything between language and current page and process the tree in PHP.
However my current regex has some problems and I can't figure them out:
- If language is on its own, it doesn't get captured
- The second group captures the slash betwen language and the tree
Link to Regex101: https://regex101.com/r/ecHBQT/1
CodePudding user response:
This likely does it: Split the URL by slash into lang, tree, and page at the proper place, with all three parts possibly empty:
RewriteRule ^([a-z]{2}\b)?\/?(?:\/?(. )\/)?(.*)$ /index.php?lang=$1&tree=$2&page=$3 [L,NC]
Testcase in JavaScript using this regex:
const regex = /^([a-z]{2}\b)?\/?(?:\/?(. )\/)?(.*)$/;
[
'',
'en',
'en/test',
'en/parent/test',
'test',
'parent/test',
'ggparent/gparent/parent/test'
].forEach(str => {
let rewritten = str.replace(regex, '/index.php?lang=$1&tree=$2&page=$3');
console.log('"' str '" ==>', rewritten);
})
Output:
"" ==> /index.php?lang=&tree=&page=
"en" ==> /index.php?lang=en&tree=&page=
"en/test" ==> /index.php?lang=en&tree=&page=test
"en/parent/test" ==> /index.php?lang=en&tree=parent&page=test
"test" ==> /index.php?lang=&tree=&page=test
"parent/test" ==> /index.php?lang=&tree=parent&page=test
"ggparent/gparent/parent/test" ==> /index.php?lang=&tree=ggparent/gparent/parent&page=test
Notes:
- This assumes that a page and parent must not be exactly two chars long (you could specify an explicit or-list of all languages you have)
- Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
CodePudding user response:
I hope I've understood your question right. You can try this regex:
^([a-z]{2}(?=\/|$))?(?:\/?(. )\/)?(.*)
This will match 3 groups: first the language (two characters), then the parents and the last group is last part of the URL (after /
).