Please help in writing the regular expression for below line in bash. There are muliple lines like this in my file, and I want to only capture the below values from the file. How can I do so?
value to be extracted -
Y324256
vmigha3
idea
Note - The value to be extracted is always alphanumeric. Square brackets on the left and right side will always have numeric values which I don't want to extract.
File looks like this-
2021-05-13 15:35:31,804 [16] [Y324256] [341745] DEBUG Server - End Webservice method GetProcessStatus
2021-05-13 15:35:32,587 [11] [vmigha3] [341749] DEBUG Domain - Reading user permissions from the database
2022-03-03 09:08:10,699 [31] [idea] [80387] INFO Server - Begin Webservice call
2022-04-06 01:18:33,822 [MonitorThread] INFO - Reading user permissions from the database
2022-04-06 01:18:33,845 [None] DEBUG -Begin Webservice call
code
grep -oP '\[\K\d*[A-Za-z][\dA-Za-z]*(?=])' file > output_file
But this gives me below result -
Y324256
vmigha3
idea
MonitorThread
None
I am wondering if this can be done by first finding the lines which begins with 2022-04-06 01:18:33,845 this date format , then capture the 4th record.
CodePudding user response:
Assumptions/Understandings:
- if there are 3x sets of
[...]
in the line then print the contents of the 2nd[...]
- otherwise skip the line
One awk
idea using dual field delimiters of ]
and [
(and no need for a regex):
$ awk -F'[][]' 'NF>6 {print $4}' file
Y324256
vmigha3
idea
CodePudding user response:
Does it have to be grep? I mean:
cut -d[ -f3 file | cut -d] -f1
If it has to be, then just match the brackets.
grep -oP '^[^[]*\[[^[]*\[\K[^]]*'