I am trying to write a simple python program to read a log file and extract specific values I have the following log line I want to look out for
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
I have many topics such as myTopic2
, myTopic3
etc
I want to be able to detect all such lines which show the total incoming bytes for various topics and extract the value. Is there any easy and efficient way to do so ? basically I want to be able to detect the following pattern
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.${}.TotalIncomingBytes.Count, value=${}
Ignoring the timestamp ofcourse
CodePudding user response:
Maybe something like this:
resultLines = []
resultSums = {}
with open('recent.logs') as f:
for idx, line in enumerate(f):
pieces = line.rsplit('.TotalIncomingBytes.Count, value=', 1)
if len(pieces) != 2: continue
value = pieces[1]
pieces = pieces[0].rsplit(' [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.', 1)
if len(pieces) != 2: continue
topic = pieces[1]
value = int(value)
resultLines.append({
'idx': idx,
'line': line,
'topic': topic,
'value': value,
})
if topic not in resultSums:
resultSums[topic] = 0
resultSums[topic] = resultSums[topic] value
for topic, value in resultSums.iteritems():
print(topic, value)
CodePudding user response:
Here's the way I would do it. This could also be done with a regular expression.
data = """\
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
"""
counts = {}
for line in data.splitlines():
if '[INFO ] metrics' in line:
parts = line.split(' - ')
parts = parts[1].split(', ')
dct = {}
for part in parts:
key,val = part.split('=')
dct[key] = val
if dct['name'] not in counts:
counts[dct['name']] = int(dct['value'])
else:
counts[dct['name']] = int(dct['value'])
print(counts)
Output:
{'Topic.myTopic1.TotalIncomingBytes.Count': 62175807}
Here's a regex version:
pattern = re.compile(r".* - type=([^,]*), name=([^,]*), value=([^,]*)")
counts = {}
for line in data.splitlines():
if '[INFO ] metrics' in line:
parts = pattern.match(line)
if parts[2] not in counts:
counts[parts[2]] = int(parts[3])
else:
counts[parts[2]] = int(parts[3])
print(counts)