Home > Blockchain >  Using regex in Apache's config files and .htaccess
Using regex in Apache's config files and .htaccess

Time:05-19

If I understand correctly expression .ht* in the next code will match all that starts with .ht, so my .ht_lalala is safe.

<Files ".ht*">
    Require all denied
</Files>

But what about next one?

(^\.ht|~$|back|BACK|backup|BACKUP$)

Is it correct for matching files: .htaccess, back, backup, BACKUP? Or next will be better instead

(^\.ht*|back*|BACK*$)

What I'd like to understand is what ~$ actually means in my code. I don't know where I saw it, but I have it in my code, and now I doubt that it's correct. Maybe it meant to be something like (^.ht|~$) just for one group.


I know basic things about regex, what is ^ and $, and that * means 0 or N from previous text/token, but ~ doesn't make sense inside the pattern, unless it's just a simple character and it does nothing but matches ~. I've read Apache docs, I guess for multiple matches FilesMatch and DirectoryMatch is better, however regular expressions can also be used on directives Files and Directory, with the addition of the ~ character, as is stated in the docs examples.

<Files ~ "\.(gif|jpe?g|png)$">
    #...
</Files>

And well, what I want exactly is to know how to match different files or directories.

One more thing, should I escape the .? Because default httpd.conf doesn't do so. Or it's just different for httpd.conf and .htaccess (which doesn't make sense to me)

CodePudding user response:

<Files ".ht*">

In this context, .ht* is not a regular expression (regex). It is a "wild-card string", where ? matches any single character, and * matches any sequence of characters. (Whilst this is also a valid regex - a regex would match differently).

But what about next one?

(^\.ht|~$|back|BACK|backup|BACKUP$)

This is a regex (it cannot be used in the <Files> directive as you have written above, without enabling regex pattern matching with the ~ argument - as you have used later.)

In this regex, ~$ matches any string that ends with a literal ~ (tilde character). This is sometimes used to mark backup files.

It also matches...

  • Any string that starts .ht (which naturally includes .htaccess).
  • Any string that contains back or BACK or backup (matching backup is obviously redundant).
  • Any string that ends with BACKUP.

Consequently, this does not look like it's doing quite what you think it's doing.

Or next will be better instead

(^\.ht*|back*|BACK*$)

Whilst this is a valid regex, you've obviously reverted back to a mix of "wild-card" pattern matching. Bear in mind that in regex speak, the * quantifier matches the previous token 0 or more times. It does not match "any characters", as in wild-card pattern matching.

This still matches ".htaccess", but only because the pattern is not anchored. For example, ^\.ht*$ (with an end-of-string anchor) would not match ".htaccess".

<Files ~ "\.(gif|jpe?g|png)$">

With the Files directive, the ~ argument enables regex pattern matching. (As you've stated.) This is quite different from when ~ is used inside the regex pattern itself.

One more thing, should I escape the .? Because default httpd.conf doesn't do so. Or it's just different for httpd.conf and .htaccess (which doesn't make sense to me)

I think you're mixing things up. In your first example, it's not a regex, it's a "wild-card" pattern (as stated above). In this context, the . must not be backslash-escaped. It matches a literal . (dot). The . carries no special meaning here. The . should only be escaped if you need to match a literal dot in a regular expression.

For example, the following are equivalent:

# Wild-card string match
<Files ".ht*">

and

# Regex pattern match
<Files ~ "^\.ht">

(However, it is preferable to use FilesMatch instead of Files ~ to avoid any confusion. FilesMatch is "newer" syntax.)

There is no difference between httpd.conf and .htaccess in this regard.

CodePudding user response:

When in doubt, RTFM.

~ enables regex. Without it, you just get access to wildcards ? and *.

As far as I know Apache uses the PCRE flavor of regex.

So once you've enabled regex via ~ then use https://regex101.com/r/lPkMHK/1 to test the behavior of the regex you've written.

  • Related