Home > Software engineering >  failregex misses entries
failregex misses entries

Time:05-15

I need to hit ›page not found‹ log entries like this one:

185.220.100.252 - - [13/May/2022:10:03:58  0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"

This failregex basically works

^<HOST> -\s*- \[.*\] "GET .*" 404 \d  "-" ".*"$

and finds 8900 out of 30k entries. I'm testing with

fail2ban-regex /var/log/apache2/scienceblog.at.access.log '^<HOST> -\s*- \[.*\] "GET .*" 404 \d  "-" ".*"$'

And so does

^<HOST> -\s*- \[.*.*\] "GET .*" 404 \d  "-" ".*"$

But when I try to get specific between the square brackets like in one of

^<HOST> -\s*- \[.*\d.*\] "GET .*" 404 \d  "-" ".*"$
^<HOST> -\s*- \[.*\s.*\] "GET .*" 404 \d  "-" ".*"$
^<HOST> -\s*- \[.* .*\] "GET .*" 404 \d  "-" ".*"$
^<HOST> -\s*- \[\d.*\] "GET .*" 404 \d  "-" ".*"$
^<HOST> -\s*- \[.*0200\] "GET .*" 404 \d  "-" ".*"$
^<HOST> -\s*- \[.* .*\] "GET .*" 404 \d  "-" ".*"$

or anything else (let alone a regex evaluating the whole date-string) the filter wouldn't find a single log entry and I can't figure out, why. I've already read, what I've found on fail2ban-regex here and elsewhere, but to no avail.

CodePudding user response:

The failregex matches the logfile entry without the date, so for your example

185.220.100.252 - - [13/May/2022:10:03:58  0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"

fail2ban has extracted the date on its own

13/May/2022:10:03:58 0200

and removed it from the log entry, and so is actually matching your regex against

185.220.100.252 - - [] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"

so the regexes that worked for you, are working because

\[.*\] and \[.*.*\] both match [] but the other ones only match if there's actually something between the brackets.

imho this is not at all intuitive, since the output for "missed lines" includes the date:

Lines: 1 lines, 0 ignored, 0 matched, 1 missed
[processed in 0.01 sec]

|- Missed line(s):
|  185.220.100.252 - - [13/May/2022:10:03:58  0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"

But you can verify this is the case since this will give a successful match:

'^<HOST> -\s*- \[\] "GET .*" 404 \d  "-" ".*"$'

Further reading:

https://dee.underscore.world/blog/fail2ban-filters/

  • Related