I am trying to parse out several dynamic strings via Grok/Regex that exist in log messages between ()
. For example (SenderPartyName
below):
2021/05/23 16:01:26.094 High Messaging.Message.Delivered Id(ci1653336085475.12327434@test_te) MessageId(EPIUM#1130754#84601671) SenderPartyName(Mcdonalds (CFH) Restaurant Glen) ReceiverPartyName(TEST_HERE_AGAIN) SenderRoutingId(08Mdsfkm853)
I would want to parse each key-value out from the string that follow the () format. Here is my grok pattern so far. I've been testing with https://grokdebug.herokuapp.com/
%{DATESTAMP:ts} %{WORD:loglevel} %{DATA:reason}\s ?(Id\(%{DATA:id}\))? ?(MessageId\(%{DATA:originalmessageid}\))? ?(SenderPartyName\((?<senderpartyname>. ?\). ?)\))? ?(ReceiverPartyName\(%{DATA:receiverpartyname}\))? ?(SenderRoutingId\(%{DATA:senderroutingid}\))?
This works when there are ()
within the nested string like this:
Mcdonalds (CFH) Restaurant Glen
...but it is dynamic and could appear without () like such: Mcdonalds Restaurant Glen
Trying to build regex to account for both scenarios with this portion of the grok pattern:
?(SenderPartyName\((?<senderpartyname>. ?\). ?)\))?
Currently this parses the non-parenthesis case like this though:
"senderpartyname": "Mcdonalds Restaurant Glen) ReceiverPartyName(TEST_HERE_AGAIN"
..where desired state is one of the following depending on the string:
"senderpartyname": "Mcdonalds Restaurant Glen"
or
"senderpartyname": "Mcdonalds (CFH) Restaurant Glen"
CodePudding user response:
You can use
%{DATESTAMP:ts}\s %{WORD:loglevel}\s %{DATA:reason}\s Id\(%{DATA:id}\)(?:\s MessageId\(%{DATA:originalmessageid}\))?(?:\s SenderPartyName(?<senderpartyname>\((?:[^()] |\g<senderpartyname>)*\)))?(?:\s ReceiverPartyName\(%{DATA:receiverpartyname}\))?(?:\s SenderRoutingId\(%{DATA:senderroutingid}\))?
Note I revamped it so that all optional fields match one or more whitespaces and the fields as obligatory patterns, but they are made optional as a sequence, which makes matching more efficient.
The main thing changed is (?:\s SenderPartyName(?<senderpartyname>\((?:[^()] |\g<senderpartyname>)*\)))?
, it matches
(?:
- start of a non-capturing group:\s
- one or more whitespacesSenderPartyName
- a fixed word(?<senderpartyname>\((?:[^()] |\g<senderpartyname>)*\))
- Group "senderpartyname":(
(matched with\(
), then zero or more repetitions of any char other than(
and)
or the Group "senderpartyname" pattern recursed ( see(?:[^()] |\g<senderpartyname>)*
) and then a)
char (matched with\)
)
)?
- end of the group, one or zero repetitions (optional)