Line split a log file-CodePudding

I'm no programmer or developer by any stretch of the imagination, and I've come across a Python 3 programming issue that I can't visualize to solve.

I have a firewall/router that uses Suricata as an IDPS that saves log file data in a file named alerts.log. The Suricata logs are filled with hundreds of line entries, such as:

11/10/2022-12:47:06.318702  [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}

The above is a 1 line entry from one of the Suricata log files. In that line you will see -

SURICATA STREAM FIN out of window and [1:2210038:2]

What I really need from those 2 snippets is an entry in a separate text file formatted as:

# SURICATA STREAM FIN out of window
1:2210038

I've been trying to use the str.split() method to accomplish this with no luck.

I've been using the following code as a test in my IDE (PyCharm) to see if I can display the text I need in the format I need it:

infile = open('alerts.log', "r")
print("\n")
for line in infile:
    message = (line.split('[**]')[1].split('] ')[1].strip())
    sid = (line.split(' [')[2])
    print("# "   message)
    print(sid)
print("\n")

The above code results in the following which is the closest I've been able to get to what I need -

# SURICATA STREAM FIN out of window
1:2210038:2] SURICATA STREAM FIN out of window

On the 2nd line, if I could just remove the second colon and everything to the right, that would be what I need. Anyone have any ideas to assist a non-programmer with this? Thank you!

CodePudding user response：

A regular expression that matches the patterns you want:

import re

line = '11/10/2022-12:47:06.318702  [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}'

m = re.search(r'\[(\d :\d ):\d ] (.*?) \[', line)
if m:
    print('#', m.group(2))
    print(m.group(1))

Output:

# SURICATA STREAM FIN out of window
1:2210038

Here's a verbose version if you want to understand the expression. Note that whitespace has to be explicit in verbose mode.

import re

line = '11/10/2022-12:47:06.318702  [**] [1:2210038:2] SURICATA STREAM FIN out of window [**] [Classification: Generic Protocol Command Decode] [Priority: 3] {TCP}'

m = re.search(r'''(?x)       # enable verbose mode (comments)
                  \[         # match a open bracket
                  (\d :\d )  # capture digits colon digits
                  :\d ]      # match colon, digits and close bracket
                  \s         # match a whitespace
                  (.*?)      # capture non-greedy everything up to...
                  \s\[       #   whitespace and open bracket.
                  ''', line)
if m:
    print('#', m.group(2))
    print(m.group(1))

CodePudding user response：

Considering that this log file follow the same pattern you can use this inside your loop

.split(' [**] ')[1].split(' ', 1)

So your final code result would be like this:

for line in file:
    message = line.split(' [**] ')[1].split(' ', 1)
    print("MESSAGE: {}".format(message[0]))
    print("SID: {}".format(message[1]))

And this would be your final result:

MESSAGE: [1:2210038:2]
SID: SURICATA STREAM FIN out of window