I wrote the below code to extract two values from a specific line in a text file. My text file have multiple lines of information and I am trying to find the below line
2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856
I am extracting the time (11:15:09) and bandwidth (1751856) from above line
import re
import matplotlib.pyplot as plt
import sys
time =[]
bandwidth = []
myfile = open(sys.argv[1])
for line in myfile:
line = line.rstrip()
if re.findall('TMMBR with bps:',line):
time.append(line[12:19])
bandwidth.append(line[-7:])
plt.plot(time,bandwidth)
plt.xlabel('time')
plt.ylabel('bandwidth')
plt.title('TMMBR against time')
plt.legend()
plt.show()
The problem here is that i am giving absolute index values(line[12:19]) to extract the data which doesnt work out if the line have some extra characters or have any extra spaces. What regular expression i can right to extract the values? I am new to RE
CodePudding user response:
You can just use split:
BPS_SEPARATOR = "TMMBR with bps: "
for line in strings:
line = line.rstrip()
if BPS_SEPARATOR in line:
time.append(line.split(" ")[1])
bandwidth.append(line.split(BPS_SEPARATOR)[1])
CodePudding user response:
Use context manager for handling a file
don't use
re.findall
for just checking the occurrence of a pattern in a string; it's not efficient. Usere.search
instead for regex cases
In your case it's enough to split a line and get the needed parts:
with open(sys.argv[1]) as myfile:
...
if 'TMMBR with bps:' in line:
parts = line.split()
time.append(parts[1][:-4])
bandwidth.append(parts[-1])
CodePudding user response:
Try this:
(?:\d :\d :|(?<=TMMBR with bps: ))\d
(?:\d :\d :|(?<=TMMBR with bps: ))
non-capturing group.\d :
one or more digits followed by a colon:
.\d :
one or more digits followed by a colon:
.|
OR(?<=TMMBR with bps: )
a position where it is preceded by the sentenceTMMBR with bps:
.
\d
one or more digits.
See regex demo
import re
txt1 = '2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856'
res = re.findall(r'(?:\d :\d :|(?<=TMMBR with bps: ))\d ', txt1)
print(res[0]) #Output: 11:15:09
print(res[1]) #Output: 1751856
CodePudding user response:
You can make the match more specific with 2 capture groups:
^\d{4}-\d\d-\d\d\s (\d\d:\d\d:\d\d)\.\d .*\bTMMBR with bps:\s*(\d )$
See a regex101 demo.
import re
s = r"2022-05-03 11:15:09.395 [6489266] | (rtcp_receiver.cc:823): BwMgr Received a TMMBR with bps: 1751856"
pattern = r"\d{4}-\d\d-\d\d\s (\d\d:\d\d:\d\d)\.\d .*\bTMMBR with bps:\s*(\d )$"
m = re.search(pattern, s)
if m:
print(m.groups())
Output
('11:15:09', '1751856')