Home > Enterprise >  Regex Log Parsing - how to begin parsing lines at one event and stop parsing lines at another event?
Regex Log Parsing - how to begin parsing lines at one event and stop parsing lines at another event?

Time:10-19

I have a log file which tracks several game events per line, however this also includes events that happen outside of 'official' game time (i.e. pregame etc). I have predefined regex patterns that can read and parse each event, and aggregates these stats, though this includes the excess stats that occur both before and after the official round.

My stat aggregation is currently fine, I am currently struggling with parsing between a range of two events. There is no explicit 'game start' event, though there is a 'round start' event, which counts for every round that is started during the match. Game over is simpler, as 'game over' is an event which is logged.

If I am able to read the 'round start' event, and the 'game over' event, how would I be able to begin reading lines in the file at the first instance of 'round start', and finish reading lines once game over has been triggered?

eg:
line 37 | trigger "(Round_Start)"  <-- begin parsing here
...
line 192 | trigger "(Round_Start)"
...
line 304 | trigger "(Round_Start)"
...
line 486 | trigger "(Round_Start)"
...
line 594 | trigger "(Game_Over)"    <-- finish parsing here

See some code below which may help.

dmgEvent_P = re.compile(r'"([\w\s] )<.*hurt "([\w\s] )<.*\(dmg "(\d )"')
hpEvent_P = re.compile(r'"([\w\s] )<.*healed "([\w\s] )<.*\(hp "(\d )"')
roundStart_P = re.compile(r'trigger "(Round_Start)"')
gameOver_P = re.compile(r'trigger "(Game_Over)"')



matches = dmgEvent_P.finditer(contents)
    for match in matches:
        dealer = match.group(1)
        receiver = match.group(2)
        dmg = int(match.group(3))
        modifyDMG(dealer, receiver, dmg)
matches = healthEvent_P.finditer(contents)
    for match in matches:
        dealer = match.group(1)
        receiver = match.group(2)
        hp = int(match.group(3))
        modifyHP(dealer, receiver, hp)

There are other ingame events being tracked but they function very similarly.

As it stands, my current code currently parses all events by reading the log's contents entirely per each regex parsing function, rather than collectively line by line. I would like to be able to collectively parse these lines within a range as I've defined above.

CodePudding user response:

You can match trigger "(Round_Start)" and read all lines that do not contain trigger "(Game_Over)"

^.*?\btrigger "\(Round_Start\)".*(?:\n(?!.*\(Game_Over\)).*)*

Regex demo

If (Game_Over) should be present after the lines, you can capture the lines before it in a capture group, and match the lines afterwards that contains (Game_Over)

^(.*?\btrigger "\(Round_Start\)".*(?:\n(?!.*\(Game_Over\)).*)*)\n.*?\btrigger "\(Game_Over\)"

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • .*?\btrigger "\(Round_Start\)".* Match a whole line that contains trigger "(Round_Start)"
    • (?: Non capture group to repeat as a whole part
      • \n(?!.*\(Game_Over\)).* Match a newline, and the rest of the line if it does not contain (Game_Over) using a negative lookahead. If you want to exclude more lines, you can use (?!.*(?:\(Game_Over\)|another string)
    • )* Close non capture group and optionally repeat
  • ) Close capture group 1
  • \n.*?\btrigger "\(Game_Over\)" Match a newline, and match trigger "(Game_Over)" in the line

Regex demo

Note to escape the parenthesis \( and \) or else"(Round_Start)" will not match, and only (Round_Start) will capture the text in a capture group.

  • Related