I've been struggling to get a regex string working. It's being used for Promtail to parse labels from my logs. The problem I'm having is it's not working with positive lookahead (because I think promtail is written in go?)
Anyway the logs are web logs and here are a few examples:
INFO: 172.0.0.1:0 - "POST /endpoint1/UNIQUE-ID?key=unique_value HTTP/1.1" 200 OK
INFO: 172.0.0.2:0 - "GET /endpoint/health HTTP/1.1" 200 OK
172.0.0.1:0 - - [04/Mar/2022:10:52:10 -0500] "GET /endpoint2/optimize HTTP/1.1" 200 271
INFO: 172.0.0.3:0 :0 - "GET /endpoint3?key=unique_value HTTP/1.1" 200 OK
Another thing worth pinting out is that the UNIQUE-ID
is going to be a VIN ID (vehicle identification number)
The groups I'm looking to create are: ip
request
endpoint
status
. However, because of all the UNIQUE_ID
in endpoint1 and the unique_values
in endpoint1 and endpoint3, using the full endpoint path causes too many streams in loki and essentially kills it.
My solution regex looks like this:
(?P<ip>((?:[0-9]{1,3}\.){3}[0-9]{1,3})). (?P<request>(GET|POST|HEAD|PUT|DELETE|CONNECT|OPTIONS|TRACE|PATCH)).(?P<endpoint>(. endpoint1\/health)|(. endpoint1)|(. )(\?)|(. ) ). \".(?P<status>([0-9]{3}))
And it captures the following groups:
ip: `172.0.0.1`, `172.0.0.2`, `172.0.0.1` `172.0.0.3`
request: `POST`, `GET`, `GET`, `GET`
endpoint: `/endpoint1`, `/endpoint1/health`, `/endpoint2/optimize `, `/endpoint3?`
status: `200`,`200`,`200`,`200`
The problem is the endpoints for /endpoint2/optimize
and /endpoint3?
. endpoint2 has a trailing space at the end and endpoint3 includes the ?
. I was able to get this working using positive lookahead with the following regex, but it throws an error in Promtail.
(?P<ip>((?:[0-9]{1,3}\.){3}[0-9]{1,3})). (?P<request>(GET|POST|HEAD|PUT|DELETE|CONNECT|OPTIONS|TRACE|PATCH)).(?P<endpoint>(. endpoint1\/health)|(. endpoint1)|(. )(?=\?)|(. )(?= )). \".(?P<status>([0-9]{3}))
Any help would be greatly appreciated! I am far from pretending like I know my way around regex...
EDIT: Here is an example https://regex101.com/r/FXvnqR/1
CodePudding user response:
EDIT
Try this! (?P<ip>((?:[0-9]{1,3}\.){3}[0-9]{1,3})). (?P<request>(GET|POST|HEAD|PUT|DELETE|CONNECT|OPTIONS|TRACE|PATCH)).(?P<endpoint>(/endpoint[1-3]?(?:\/health|\/optimize)?))?. \".(?P<status>([0-9]{3}))
https://regex101.com/r/DKqRpL/1
if there are going to be endpoints that include numbers other than 1-3 or subsequent routing other than health or optimize this will need to be edited, but as of now this is your fix bud