Home > OS >  Custom pattern matching in python
Custom pattern matching in python

Time:12-02

I am trying to write a simple python program to read a log file and extract specific values I have the following log line I want to look out for

2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269

I have many topics such as myTopic2, myTopic3 etc

I want to be able to detect all such lines which show the total incoming bytes for various topics and extract the value. Is there any easy and efficient way to do so ? basically I want to be able to detect the following pattern

2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.${}.TotalIncomingBytes.Count, value=${}

Ignoring the timestamp ofcourse

CodePudding user response:

Maybe something like this:

resultLines = []
resultSums = {}
with open('recent.logs') as f:
    for idx, line in enumerate(f):
        pieces = line.rsplit('.TotalIncomingBytes.Count, value=', 1)
        if len(pieces) != 2: continue

        value = pieces[1]

        pieces = pieces[0].rsplit(' [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.', 1)
        if len(pieces) != 2: continue

        topic = pieces[1]
        value = int(value)

        resultLines.append({
            'idx': idx,
            'line': line,
            'topic': topic,
            'value': value,
        })

        if topic not in resultSums:
            resultSums[topic] = 0
        resultSums[topic] = resultSums[topic]   value

for topic, value in resultSums.iteritems():
    print(topic, value)

CodePudding user response:

Here's the way I would do it. This could also be done with a regular expression.

data = """\
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
2022-12-02 13:13:10.539 [metrics-writer-1] [INFO ] metrics - type=GAUGE, name=Topic.myTopic1.TotalIncomingBytes.Count, value=20725269
"""

counts = {}

for line in data.splitlines():
    if '[INFO ] metrics' in line:
        parts = line.split(' - ')
        parts = parts[1].split(', ')
        dct = {}
        for part in parts:
            key,val = part.split('=')
            dct[key] = val
        if dct['name'] not in counts:
            counts[dct['name']] = int(dct['value'])
        else:
            counts[dct['name']]  = int(dct['value'])

print(counts)

Output:

{'Topic.myTopic1.TotalIncomingBytes.Count': 62175807}

Here's a regex version:


pattern = re.compile(r".* - type=([^,]*), name=([^,]*), value=([^,]*)")
counts = {}

for line in data.splitlines():
    if '[INFO ] metrics' in line:
        parts = pattern.match(line)
        if parts[2] not in counts:
            counts[parts[2]] = int(parts[3])
        else:
            counts[parts[2]]  = int(parts[3])

print(counts)
  • Related