Home > Blockchain >  Regex parsing custom Apache log with added field after "size" field
Regex parsing custom Apache log with added field after "size" field

Time:11-17

I'm using a regex parser to parse Apache log lines in standard format plus an added field between 'size' and 'referer' that I'll call 'elapsed'. I'm not able to parse correctly the fields after 'elapsed' field. Can you help?

Try it on rubular

Line to parse:

10.1.1.1 - - [16/Nov/2022:15:34:38  0000] "GET /server-status HTTP/1.1" 200 32 0 "-" "kube-probe/1.21"

Regex:

^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S )(?:  (?<path>[^ ]*)  \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<elapsed>[^ ]*) (?: "(?<referer>[^\"]*)" "(?<agent>.*)")?

Result:

host    10.121.17.62
user    -
time    16/Nov/2022:15:34:38  0000
method  GET
path    /server-status
code    200
size    32057
elapsed 0
referer  
agent

Expected result:


host    10.121.17.62
user    -
time    16/Nov/2022:15:34:38  0000
method  GET
path    /server-status
code    200
size    32057
elapsed 0
referer -    
agent   kube-probe/1.21

CodePudding user response:

The space before the optional non-capturing group ((?: "(?<referer>[^\"]*)" "(?<agent>.*)")?) must be removed, and it will solve the problem.

However, I would recomment using

^(?<host>\S ) \S  (?<user>\S ) \[(?<time>[^\]]*)\] "(?<method>\S )(?:  (?<path>\S )  \S*)?" (?<code>\S ) (?<size>\S ) (?<elapsed>\S )(?: "(?<referer>[^"]*)" "(?<agent>.*)")?

See the regex demo. Here, I replaced [^ ]* with \S since it is what is meant, match one or more non-whitespace chars.

  • Related