Home > Blockchain >  htaccess - use custom variable within rules
htaccess - use custom variable within rules

Time:10-29

I have htaccess file and in it I have something like this:

RewriteRule ^((en|us|uk|fr|de)/)?([0-9\-.] );([0-9\-.] )$ index.php?l=$2&t=$3;$4 [QSA,L]
RewriteRule ^((en|us|uk|fr|de)/)?([A-Za-z\-] )$ index.php?l=$2&t=$3 [QSA,L]

and many similar lines. Also, in all rules, the language may not be present at all.

Can I somehow put the list of languages (en|us|uk|fr|de) into a variable and use only this variable? With the current approach, adding a new language means rewriting many rules.

CodePudding user response:

Can I somehow put the list of languages (en|us|uk|fr|de) into a variable and use only this variable?

You can't use a "variable" directly in the regular expression since the regex engine (PCRE) used by Apache does not support this type of syntax.

You could instead make the regex more generic and match any 2 lowercase letters and rely on your application to validate the language code (which you should be doing anyway). You then don't need to update your Apache config at all when adding a new language (which would be preferable). For example:

RewriteRule ^(([a-z]{2})/)?([0-9\-.] );([0-9\-.] )$ index.php?l=$2&t=$3;$4 [QSA,L]

In addition... providing you don't have any other non-language URLs that could legitimately have 2 lowercase letters as the first path segment then you could also validate this in .htaccess, with an additional rule that precedes your existing language rules. For example:

# Validate language code in first path segment
RewriteCond $1 !^(en|us|uk|fr|de)$
RewriteRule ^([a-z]{2})/ - [R=404]

The above rule states... If 2 lowercase letters are passed in the first path segment and this 2 char sequence does not match one of the stated language codes then trigger a 404. No subsequent rules are processed.

This allows you to state the valid language codes just once at the top of the file. But this does restrict your URL structure (without additional rules/conditions), in that you can't have a URL of the form /xx/... where xx is not a language code.


UPDATE: Using a "variable"...

Having had a another think about this... You could potentially use a "variable", but you would need to add a condition (RewriteCond directive) to each rule in order to compare the language code in the requested URL-path to the "list" of language codes in the (environment) variable.

For example:

# Define "list" of valid language codes
RewriteRule ^ - [E=LANG_CODES:en|us|uk|fr|de]

RewriteCond %{ENV:LANG_CODES}@$2 ([a-z]{2}).*@\1?$
RewriteRule ^(([a-z]{2})/)?([0-9\-.] );([0-9\-.] )$ index.php?l=$2&t=$3;$4 [QSA,L]

The value assigned to the LANG_CODES environment variable is just a string of language codes separated by any unique character(s). I used the pipe (vertical bar) as the separator, like the regex alternation, but this is not a regex.

The regex ([a-z]{2}).*@\1?$ uses an internal backreference (\1) to match the language code passed in the URL-path with a language code in the LANG_CODES string. The additional complication is that there may not be a language code at all (hence the need for the trailing ?$). This regex isn't particularly efficient as it could involve a lot of backtracking (although this is a relatively minor issue in this case).

Needless to say, this would potentially add a lot of "bloat" if you have many rules. And the use of the env var could be problematic if any "looping" occurs by the rewrite engine, since the env var could be "renamed" (other rules might need to be modified to allow for this or prevent looping altogether).

This "update" is really just of "academic" interest (although it does avoid conflict with any non-language URL that happens to have an initial path segment that consists of just two characters). The first solution I presented above would be preferable.


Aside: I would make the first group in the RewriteRule pattern non-capturing then the language code would be available in the $1 backreference as opposed to $2. For example:

RewriteRule ^(?:([a-z]{2})/)?([0-9\-.] );([0-9\-.] )$ index.php?l=$1&t=$2;$3 [QSA,L]

As a general rule, any regex group before the first one you are interested in should be non-capturing, so your captured groups of interest always start $1.

  • Related