I'm after a regex that will enable me to capture the name of the mailbox within the first line, then the corresponding value as a 'count' group after the carriage return. The below is a sample, there are roughly 14 mail addresses in total;
[email protected]
705
[email protected]
14
[email protected]
29
reassuredtest.com
20
[email protected]
8
I have used Rubular and I'm able to capture the values of the 'count' so to speak, but when using this then in splunk - I'm only capturing the first value of 705 as I believe it's falling foul of the '.' in 'Go.live';
[a-z\.] @test.com\r\n](?<count>[^\n\r] )
Would someone kindly assist me in a query that would cycle through, capture the mailbox name as one capture group, then the count value that proceeds it?
CodePudding user response:
To get more than the first match from a regex in Splunk you must use the max_match
option to the rex
command. Once all of the mailbox names and counts are extracted, they're paired up, split into separate events, and then broken apart again. If you try to split the events without pairing up the names and counts then you'll lose the association between the name and its count.
Here's a run-anywhere example query. Note that I had to fix the fourth email address.
| makeresults | eval _raw="[email protected]
705
[email protected]
14
[email protected]
29
[email protected]
20
[email protected]
8"
```Commands above create demo data. Delete IRL```
```Extract the mailbox names and counts```
| rex max_match=0 "(?<mailbox>[^@] )@[\s\S] ?(?<count>\d )"
```Combine each mailbox name with its count```
| eval results=mvzip(mailbox,count)
```Remove line ends```
| eval results=trim(results,"
")
```Separate each name/count pair into their own events```
| mvexpand results
```Break out the mailbox and count values into separate fields again```
| eval results=split(results,",")
| eval mailbox=mvindex(results, 0), count=mvindex(results, 1)
```Display the results```
| table mailbox count