I'm no programmer or developer by any stretch of the imagination, and I've come across a Python 3 programming issue that I can't visualize to solve.
I have a firewall/router that uses Suricata as an IDPS that saves log file data in a file named alerts.log. The Suricata logs are filled with hundreds of line entries, such as:
11/10/2022-12:47:06.318702 [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}
The above is a 1 line entry from one of the Suricata log files. In that line you will see -
SURICATA STREAM FIN out of window
and [1:2210038:2]
What I really need from those 2 snippets is an entry in a separate text file formatted as:
# SURICATA STREAM FIN out of window
1:2210038
I've been trying to use the str.split()
method to accomplish this with no luck.
I've been using the following code as a test in my IDE (PyCharm) to see if I can display the text I need in the format I need it:
infile = open('alerts.log', "r")
print("\n")
for line in infile:
message = (line.split('[**]')[1].split('] ')[1].strip())
sid = (line.split(' [')[2])
print("# " message)
print(sid)
print("\n")
The above code results in the following which is the closest I've been able to get to what I need -
# SURICATA STREAM FIN out of window
1:2210038:2] SURICATA STREAM FIN out of window
On the 2nd line, if I could just remove the second colon and everything to the right, that would be what I need. Anyone have any ideas to assist a non-programmer with this? Thank you!
CodePudding user response:
A regular expression that matches the patterns you want:
import re
line = '11/10/2022-12:47:06.318702 [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}'
m = re.search(r'\[(\d :\d ):\d ] (.*?) \[', line)
if m:
print('#', m.group(2))
print(m.group(1))
Output:
# SURICATA STREAM FIN out of window
1:2210038
Here's a verbose version if you want to understand the expression. Note that whitespace has to be explicit in verbose mode.
import re
line = '11/10/2022-12:47:06.318702 [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}'
m = re.search(r'''(?x) # enable verbose mode (comments)
\[ # match a open bracket
(\d :\d ) # capture digits colon digits
:\d ] # match colon, digits and close bracket
\s # match a whitespace
(.*?) # capture non-greedy everything up to...
\s\[ # whitespace and open bracket.
''', line)
if m:
print('#', m.group(2))
print(m.group(1))
CodePudding user response:
Considering that this log file follow the same pattern you can use this inside your loop
.split(' [**] ')[1].split(' ', 1)
So your final code result would be like this:
for line in file:
message = line.split(' [**] ')[1].split(' ', 1)
print("MESSAGE: {}".format(message[0]))
print("SID: {}".format(message[1]))
And this would be your final result:
MESSAGE: [1:2210038:2]
SID: SURICATA STREAM FIN out of window