Home > Software engineering >  Fluentd: - problem with regex while parsing log
Fluentd: - problem with regex while parsing log

Time:01-08

I have this fluentd configuration:

<source>
   @type tail
   <parse>
   @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w ) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
      time_format %d/%b/%Y:%H:%M:%S %z
      keep_time_key true
      types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
   path /var/log/nginx/access.log
   pos_file /tmp/fluent_nginx.pos
   tag nginx
</source>

My log format:

193.137.78.17 - - [07/Jan/2023:09:21:59  0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.014
193.137.78.17 - - [07/Jan/2023:09:22:00  0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.005

I've tested my regex on regex101 and works without problems. Still, I get a no patterns matched warning on fluentd. I don't understand why the log isn't parsed correctly.

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26  0000 [warn]: #0 no patterns matched tag="nginx"

Can anyone help me, please? Thanks!

CodePudding user response:

I think your problem is leading spaces in the log

Your pattern is insisting that the <remote> has no spaces before it, but you do have 4 spaces in your log before the remote IP.

The simplest way, to my mind, is to insert an optional variable-number-of-spaces at the beginning.

^( )*(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w ) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*

How it works

The ( and ) are just to make life easier for people reading the code: they will see that between them is a space character, which they might not otherwise notice.

The * means 0 or more of these.

This allows 0 or more spaces at the beginning of the line to be matched and discarded.

Incidentally

I noticed you are sometimes escaping " with \ and sometimes not. Is there a reason for this?

CodePudding user response:

You should directly use the nginx parser plugin instead.

Here is a complete working example with the sample input plugin and the nginx parser plugin:

fluent-nginx-test.conf

<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00  0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00  0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type nginx
  </parse>
</filter>

<match nginx>
  @type stdout
</match>

Run

fluentd -c ./fluent-nginx-test.conf

Output

2023-01-07 14:22:00.000000000  0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
2023-01-07 14:22:00.000000000  0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}

Environment

  • fluentd
$ fluentd --version
fluentd 1.12.3
  • OS
$ lsb_release -d
Description:    Ubuntu 18.04.6 LTS
  • Related