Home > Mobile >  Splitting URL in three parts in htaccess - regex
Splitting URL in three parts in htaccess - regex

Time:01-29

I'm trying to split any URL that would show up on my website into three parts:

  1. Language (optional)
  2. Hierarchical structure of the page (parents)
  3. Current page

Right now I operate with 1 and 3 but I need to develop a way to allow for the pages to have the same names if they have different parents and therefore full URL is unique.

Here are the types of URL I may have:

(nothing)
en
en/test
en/parent/test
test
parent/test
ggparent/gparent/parent/test

I thought about extending my current directive:

RewriteRule ^(?:([a-z]{2})(?=\/))?.*(?:\/([\w\-\,\ ] ))$ /index.php?lang=$1&page=$2 [L,NC]

to the following:

(?:([a-z]{2})(?=\/))?(.*)\/([^\/]*)?$

Which then I could translate to index.php?lang=$1&tree=$2&page=$3 but the difficulty I have is that the second capturing group captures the slash from the beginning.

I believe I can't (based on my search so far) dynamically have all the strings between slashes to be returned and make the last one to always be first, without repeating the same regex. I thought I would capture anything between language and current page and process the tree in PHP.

However my current regex has some problems and I can't figure them out:

  1. If language is on its own, it doesn't get captured
  2. The second group captures the slash betwen language and the tree

Link to Regex101: https://regex101.com/r/ecHBQT/1

CodePudding user response:

This likely does it: Split the URL by slash into lang, tree, and page at the proper place, with all three parts possibly empty:

RewriteRule ^([a-z]{2}\b)?\/?(?:\/?(. )\/)?(.*)$ /index.php?lang=$1&tree=$2&page=$3 [L,NC]

Testcase in JavaScript using this regex:

const regex = /^([a-z]{2}\b)?\/?(?:\/?(. )\/)?(.*)$/;
[
  '',
  'en',
  'en/test',
  'en/parent/test',
  'test',
  'parent/test',
  'ggparent/gparent/parent/test'
].forEach(str => {
  let rewritten = str.replace(regex, '/index.php?lang=$1&tree=$2&page=$3');
  console.log('"'   str   '" ==>', rewritten);
})

Output:

"" ==> /index.php?lang=&tree=&page=
"en" ==> /index.php?lang=en&tree=&page=
"en/test" ==> /index.php?lang=en&tree=&page=test
"en/parent/test" ==> /index.php?lang=en&tree=parent&page=test
"test" ==> /index.php?lang=&tree=&page=test
"parent/test" ==> /index.php?lang=&tree=parent&page=test
"ggparent/gparent/parent/test" ==> /index.php?lang=&tree=ggparent/gparent/parent&page=test

Notes:

CodePudding user response:

I hope I've understood your question right. You can try this regex:

^([a-z]{2}(?=\/|$))?(?:\/?(. )\/)?(.*)

Regex demo.


This will match 3 groups: first the language (two characters), then the parents and the last group is last part of the URL (after /).

  • Related