I have a this logfile:
19-3-2020 01:37:31.995 INFO 18 188 mailbox allocated for rsvp
19-3-2020 01:37:32.039 INFO 14 194 creating mailslot for dump
19-3-2020 01:37:32.082 INFO 18 194 out of INFO allcations
19-3-2020 01:37:32.119 INFO 18 188 creating mailslot for RSVP client API
19-3-2020 01:37:32.157 INFO 10 187 creating socket for traffic CONTROL module
19-3-2020 01:37:32.157 INFO 19 186 transaction 17327 begin
19-3-2020 01:37:32.276 INFO 11 188 loopback to avoid ERROR
19-3-2020 01:37:32.276 INFO 15 187 end transaction 17327
19-3-2020 01:37:32.314 INFO 13 189 creating mailslot for terminate
I need to count the amount of transactions that have a beginning and end
I tried using the defaultdic library but clearly its just the first lines cause im not sure what to do next:
from collections import defaultdict
transactions = defaultdict(dict)
with open('logfile.log', 'r') as f:
CodePudding user response:
You can do like following:
from collections import defaultdict
transactions = defaultdict(dict)
with open('logfile.log', 'r') as f:
lines = f.readlines()
beginnings = set()
endings = set()
for line in lines:
if "transaction" in line:
if "begin" in line:
transaction_id = line.split(" ")[-2]
beginnings.add(int(transaction_id))
elif "end" in line:
transaction_id = line.split(" ")[-1]
endings.add(int(transaction_id))
intersect = beginnings.intersection(endings)
print(intersect)
Here I created 2 different sets which holds unique elements. In the end of iteration, I used set's intersection method to find common elements between them
CodePudding user response:
Here's something that works (using log as a string, you can adapt it to get the lines of the log file directly; I've added a fake transaction for demonstration purpose):
Note that I assumed that no 2 transactions can begin with the same Id; if not, using sets will not work.
import re
log = '''
19-3-2020 01:37:31.995 INFO 18 188 mailbox allocated for rsvp
19-3-2020 01:37:32.039 INFO 14 194 creating mailslot for dump
19-3-2020 01:37:32.082 INFO 18 194 out of INFO allcations
19-3-2020 01:37:32.119 INFO 18 188 creating mailslot for RSVP client API
19-3-2020 01:37:32.157 INFO 10 187 creating socket for traffic CONTROL module
19-3-2020 01:37:32.157 INFO 19 186 transaction 17327 begin
19-3-2020 01:37:32.276 INFO 11 188 loopback to avoid ERROR
19-3-2020 01:37:32.276 INFO 15 187 end transaction 17327
19-3-2020 01:37:32.314 INFO 13 189 creating mailslot for terminate
19-3-2020 01:37:32.157 INFO 19 186 transaction 123456789 begin
'''
tr_begun = set()
tr_ended = set()
for line in log.splitlines():
tr_id = re.search(r'(?<=transaction )[0-9] ', line)
if tr_id:
tr_id = tr_id[0]
if 'begin' in line:
tr_begun.add(tr_id)
if 'end' in line:
tr_ended.add(tr_id)
inter = tr_begun.intersection(tr_ended)
print(tr_begun, tr_ended, inter, len(inter))
# {'123456789', '17327'} {'17327'} {'17327'} 1 # The last number is what you want.