I have a log as below. I need to filter out only ERROR lines for which REQ ID is repeatedly in ERROR state.
As for Example in below case, expected output is REQ3 as it is in ERROR state in line 3 and again after retry it is in ERROR. ( There can be infinite retries but we can check for like in a day it is continuously in ERROR) for REQ2 , it was in ERROR state in line 2 but changed to INFO in line 5, so it is not expected in output.
....
yyyy-mm-dd [INFO] REQ1 Context 1
yyyy-mm-dd [ERROR] REQ2 Context 2
yyyy-mm-dd [ERROR] REQ3 Context 3
yyyy-mm-dd [INFO] REQ1 Context 4
yyyy-mm-dd [INFO] REQ2 Context 5
yyyy-mm-dd [ERROR] REQ3 Context 6
....
CodePudding user response:
One quick solution could be to first find all request ids for which a non-error message is printed. You can do it with something like the following:
Assuming your loglines are in text.txt
grep -v "\[ERROR\]" test.txt | cut -d ' ' -f 3 | tr '\n' '|'
This will result in all event IDs you do not want separated by|
.
You can use this pipe-separated list of strings in an inverted grep operation on the full log file to only keep things you need.
┌─[dspataro@dspataroarchazzo]─(/tmp)
└─[15:17]--[$] grep -v "\[ERROR\]" test.txt | cut -d ' ' -f 3 | tr '\n' '|'
REQ1|REQ1|REQ2|
┌─[dspataro@dspataroarchazzo]─(/tmp)
└─[15:17]--[$] R="REQ1|REQ1|REQ2"
┌─[dspataro@dspataroarchazzo]─(/tmp)
└─[15:17]--[$] cut -d ' ' -f 3 test.txt | sort | uniq | grep -E -v "$R"
REQ3
CodePudding user response:
This awk
command should do the job:
awk '
{ if ($2=="[ERROR]") a[$3]; else if ($3 in a) a[$3]=1 }
END { for (id in a) if (!a[id]) print id }
' file