I'm using a regex parser to parse Apache log lines in standard format plus an added field between 'size' and 'referer' that I'll call 'elapsed'. I'm not able to parse correctly the fields after 'elapsed' field. Can you help?
Line to parse:
10.1.1.1 - - [16/Nov/2022:15:34:38 0000] "GET /server-status HTTP/1.1" 200 32 0 "-" "kube-probe/1.21"
Regex:
^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S )(?: (?<path>[^ ]*) \S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<elapsed>[^ ]*) (?: "(?<referer>[^\"]*)" "(?<agent>.*)")?
Result:
host 10.121.17.62
user -
time 16/Nov/2022:15:34:38 0000
method GET
path /server-status
code 200
size 32057
elapsed 0
referer
agent
Expected result:
host 10.121.17.62
user -
time 16/Nov/2022:15:34:38 0000
method GET
path /server-status
code 200
size 32057
elapsed 0
referer -
agent kube-probe/1.21
CodePudding user response:
The space before the optional non-capturing group ((?: "(?<referer>[^\"]*)" "(?<agent>.*)")?
) must be removed, and it will solve the problem.
However, I would recomment using
^(?<host>\S ) \S (?<user>\S ) \[(?<time>[^\]]*)\] "(?<method>\S )(?: (?<path>\S ) \S*)?" (?<code>\S ) (?<size>\S ) (?<elapsed>\S )(?: "(?<referer>[^"]*)" "(?<agent>.*)")?
See the regex demo. Here, I replaced [^ ]*
with \S
since it is what is meant, match one or more non-whitespace chars.