Home > OS >  Logfile reformatting with regex
Logfile reformatting with regex

Time:02-08

I'm using grep to filter certain lines from a log file and present them to my conky config. The log file is /var/log/messages. The entries pertain to UFW block events.

The trouble is that I only care about certain strings of each line. I can grep the only the UFW blocks, but the line is too long to fit in conky. Even if conky were not part of the equation, learning to only show pieces of a log line would benefit me in future.

I have got somewhere by using the following:

grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages

This ugly-looking regex is filtering for entries (like) this:

IN=wlan0
SRC=10.10.123.23
DST=192.168.41.23
PROTO=TCP
SPT=443
DPT=41080

Which (nearly) line-for-line is as such:

'
IN=([a-z]){4,}
(DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3}
PROTO=[a-z]{2,6}
(SPT|DPT)=[0-9]{1,5}
'

The problem is that this produces a new line for each matching word, where I just want the filtered strings of each line, in their line.

$ grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages
IN=wlan
SRC=103.81.76.20
DST=172.31.77.54
SPT=443
DPT=41080
$

I'd rather not use a very complicated awk method unless I can walk back into it a few months later and remember it easily. awk is incredible, but it can be difficult to digest if you drop the ball just once!

Thank you.

CodePudding user response:

If I understand correctly, instead of the list provided by grep -o, you want to remove non matching strings, and print only matching strings, in place. Ie. in the lines and order they appear.

Using gawk's FPAT:

gawk -v FPAT='my-regex' '$1=$1'
  • Replace my-regex with the regex for strings you want to see.
  • This will print the matches on each line, in order, delimited by a space.
  • Add -v OFS= to remove the space, or for example, -v OFS=', ' to change the delimiting string.
  • You were using grep -w to match a whole word. You can do this in a gawk regex by using \< and \> for left and right word boundary (respectively).
  • For example, add parentheses and word boundaries around the whole list of 'or' operators (|):
  • -v FPAT='\<((IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5}))\>'
  • Note the bugs in your regex, such as tshiono commented, which won't match PROTO=TCP.
  •  Tags:  
  • Related