Home > Software design >  Regular expression to exclude UUID from capture group
Regular expression to exclude UUID from capture group

Time:08-03

I'm working with SugarCRM and Splunk.

In SugarCRM log, several time in the middle of the message I have an UUID that breaks the message.

My goal is to create a regular expression that extract a value using a capture group. This capture group is used inside a dashboard in Splunk that aggregates errors that should not have the UUID inside, but all the other parts of the message. The extraction of the message is also terminated by some special characters.

Let me make an example:

Message 1: Job 12345678-1234-1234-1234-123456789012 (JobName1) failed

Message 2: Could not find parent record 12345678-1234-1234-1234-123456789012 in module: mymodule

Message 3: Some text 12345678-1234-1234-1234-123456789012 with specific value=1234

My specific message enders are - , = , {

The UUID should or should not be contained inside the message

Message 4: Another error message without id but with value=12344

What I like to have, from Splunk, is a capture group called error_msg that should contain, for the previous examples, the text:

error_msg 1: Job (JobName1) failed

error_msg 2: Could not find parent record in module: mymodule

error_msg 3: Some text with specific value

error_msg 4: Another error message without id but with value

So, extraction of text without the UUID, ended when an escape character -, =, { is found.

I was trying to do, without any success, something like this:

(?P<error_msg>(?:\w{8}-\w{4}-\w{4}-\w{4}-\w{12})([^-|=|-|\{] ))

Can anyone help me about that?

Thanks a lot for your time :)

CodePudding user response:

Instead of trying to capture everything but a [possibly-present] UUID, just remove it instead (and then remove extra spaces):

index=ndx sourcetype=srctp message=*
| eval error_msg=replace(message,"\w{8}-\w{4}-\w{4}-\w{4}-\w{12}","")
| eval error_msg=replace(error_msg,"\s "," ")

If you know that the UUID is always contained inside whitespace, you could make the first replace() more efficient thusly:

| eval error_msg=replace(message,"\s\w{8}-\w{4}-\w{4}-\w{4}-\w{12}\s","")
  • Related