Home > Net >  Using RegEx in Apache's directives in config files and .htaccess
Using RegEx in Apache's directives in config files and .htaccess

Time:05-19

If I understand correctly expression .ht* in the next code will match all that starts with .ht, so my .ht_lalala is safe.

<Files ".ht*">
    Require all denied
</Files>

But what about next one?

(^\.ht|~$|back|BACK|backup|BACKUP$)

Is it correct for matching files: .htaccess, back, backup, BACKUP? Or next will be better instead

(^\.ht*|back*|BACK*$)

What I'd like to understand is what ~$ actually means in my code (in RegEx pattern). I don't know why and when I put it there, but I have it in my code, and now I doubt that it's correct.


I know basic things about RegEx, what is ^ and $, and that * means 0 or N from previous text/token, but ~ doesn't make sense inside the pattern, unless it's just a simple character and it does nothing but literally matches ~. I've read Apache docs, I guess for multiple matches FilesMatch and DirectoryMatch is better, however regular expressions can also be used on directives: Files and Directory, with the addition of the ~ character, as is stated in the docs examples.

<Files ~ "\.(gif|jpe?g|png)$">
    #...
</Files>

And well, what I want exactly is to know how to match different files or directories.

One more thing, should I escape the .? Because default httpd.conf doesn't do so. Or it's just different for httpd.conf and .htaccess (which doesn't make sense to me)


UPDATE

Answering to my own question, how do I match with RegEx any of this .ht, .htaccess, .htpasswd, back, BACK, backup, BACKUP, first at all I decided to use . (dot) in the name of anything I want to hide. Secondly, I found out that laconic pattern ^(\..*)$ will do the job, will give me what I need. So, if in the future I would like to hide something, I just add the . at the start of the name.

Here we go, next code will deny access from the web to any files and directories which names start with . (tested, works)

<FilesMatch "^(\..*)$">
    Require all denied
</FilesMatch>

<DirectoryMatch "^(\..*)$">
    Require all denied
</DirectoryMatch>

CodePudding user response:

<Files ".ht*">

In this context, .ht* is not a regular expression (regex). It is a "wild-card string", where ? matches any single character, and * matches any sequence of characters. (Whilst this is also a valid regex - a regex would match differently).

But what about next one?

(^\.ht|~$|back|BACK|backup|BACKUP$)

This is a regex (it cannot be used in the <Files> directive as you have written above, without enabling regex pattern matching with the ~ argument - as you have used later.)

In this regex, ~$ matches any string that ends with a literal ~ (tilde character). This is sometimes used to mark backup files.

It also matches...

  • Any string that starts .ht (which naturally includes .htaccess).
  • Any string that contains back or BACK or backup (matching backup is obviously redundant).
  • Any string that ends with BACKUP.

Consequently, this does not look like it's doing quite what you think it's doing.

Or next will be better instead

(^\.ht*|back*|BACK*$)

Whilst this is a valid regex, you've obviously reverted back to a mix of "wild-card" pattern matching. Bear in mind that in regex speak, the * quantifier matches the previous token 0 or more times. It does not match "any characters", as in wild-card pattern matching.

This still matches ".htaccess", but only because the pattern is not anchored. For example, ^\.ht*$ (with an end-of-string anchor) would not match ".htaccess".

<Files ~ "\.(gif|jpe?g|png)$">

With the Files directive, the ~ argument enables regex pattern matching. (As you've stated.) This is quite different from when ~ is used inside the regex pattern itself.

One more thing, should I escape the .? Because default httpd.conf doesn't do so. Or it's just different for httpd.conf and .htaccess (which doesn't make sense to me)

I think you're mixing things up. In your first example, it's not a regex, it's a "wild-card" pattern (as stated above). In this context, the . must not be backslash-escaped. It matches a literal . (dot). The . carries no special meaning here. The . should only be escaped if you need to match a literal dot in a regular expression.

For example, the following are equivalent:

# Wild-card string match
<Files ".ht*">

and

# Regex pattern match
<Files ~ "^\.ht">

(However, it is preferable to use FilesMatch instead of Files ~ to avoid any confusion. FilesMatch is "newer" syntax.)

There is no difference between httpd.conf and .htaccess in this regard.

CodePudding user response:

When in doubt, Read The Fine Manual.

~ enables regex. Without it, you just get access to wildcards ? and *.

As far as I know Apache uses the PCRE flavor of regex.

So once you've enabled regex via ~ then use https://regex101.com/r/lPkMHK/1 to test the behavior of the regex you've written.

  • Related