Home > database >  Regular expression to search string from a text file
Regular expression to search string from a text file

Time:01-13

I wrote the below code to extract two values from a specific line in a text file. My text file have multiple lines of information and I am trying to find the below line

2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856

I am extracting the time (11:15:09) and bandwidth (1751856) from above line

import re
import matplotlib.pyplot as plt
import sys

time =[]
bandwidth = []
myfile = open(sys.argv[1])
for line in myfile:
    line = line.rstrip()
    if re.findall('TMMBR with bps:',line):
        time.append(line[12:19])
        bandwidth.append(line[-7:])

plt.plot(time,bandwidth)
plt.xlabel('time')
plt.ylabel('bandwidth')   
plt.title('TMMBR against time')
plt.legend()
plt.show()

The problem here is that i am giving absolute index values(line[12:19]) to extract the data which doesnt work out if the line have some extra characters or have any extra spaces. What regular expression i can right to extract the values? I am new to RE

CodePudding user response:

You can just use split:

BPS_SEPARATOR = "TMMBR with bps: "
for line in strings:
    line = line.rstrip()
    if BPS_SEPARATOR in line:
        time.append(line.split(" ")[1])
        bandwidth.append(line.split(BPS_SEPARATOR)[1])

CodePudding user response:

  • Use context manager for handling a file

  • don't use re.findall for just checking the occurrence of a pattern in a string; it's not efficient. Use re.search instead for regex cases

In your case it's enough to split a line and get the needed parts:

with open(sys.argv[1]) as myfile:
    ...
    if 'TMMBR with bps:' in line:
        parts = line.split()
        time.append(parts[1][:-4])
        bandwidth.append(parts[-1])

CodePudding user response:

Try this:

(?:\d :\d :|(?<=TMMBR with bps: ))\d 
  • (?:\d :\d :|(?<=TMMBR with bps: )) non-capturing group.

    • \d : one or more digits followed by a colon :.
    • \d : one or more digits followed by a colon :.
    • | OR
    • (?<=TMMBR with bps: ) a position where it is preceded by the sentence TMMBR with bps: .
  • \d one or more digits.

See regex demo

import re

txt1 = '2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856'

res = re.findall(r'(?:\d :\d :|(?<=TMMBR with bps: ))\d ', txt1)

print(res[0]) #Output: 11:15:09

print(res[1]) #Output: 1751856

CodePudding user response:

You can make the match more specific with 2 capture groups:

^\d{4}-\d\d-\d\d\s (\d\d:\d\d:\d\d)\.\d .*\bTMMBR with bps:\s*(\d )$

See a regex101 demo.

import re

s = r"2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856"
pattern = r"\d{4}-\d\d-\d\d\s (\d\d:\d\d:\d\d)\.\d .*\bTMMBR with bps:\s*(\d )$"
m = re.search(pattern, s)
if m:
    print(m.groups())

Output

('11:15:09', '1751856')
  • Related