I am writing a Python program to do page statistics calculations. I need to grep the word sequence by sequence. If the word 'PRECHARGE' appears before 'ACTIVE', i.e. from Line 1 to 2, the value of A
will be equal to one as shown in the script below. If the word 'PRECHARGE' appears after 'ACTIVE', i.e. from Line 3 to 4, the value of error
will be equal to 1.
So in this example, I should have A = 1
and Error = 1
. When I print the value of A and Error, respectively, they are showing A = 2
, which is untrue. How to resolve this logical error?
import re
A = 0
error = 0
lines = open("page_stats_ver2.txt", "r").readlines()
for line in lines:
if re.search(r"ACTIVE", line):
if re.search(r"PRECHARGE", line):
error = 1
else:
A = 1
print(A)
print(error)
CodePudding user response:
Taking you specification literally:
from io import StringIO
example = """
1 ACTIVE
2 PRECHARGE
3 OTHER
4 PRECHARGE
5 ACTIVE
6 OTHER
7 ACTIVE
8 OTHER
9 PRECHARGE
10 OTHER
11 ACTIVE
12 PRECHARGE
13 ACTIVE
"""
# in the above A would be 2 (lines 4 & 12)
# and error also 2 (lines 2 & 12)
# none of the other lines count, since there's always an 'OTHER'
# and 12 counts for both because it has ACTIVE both before and after
# (that's probably not exactly what you need, but definitely what you specified)
# using StringIO instead of open("somefile.txt"), to get a file-like object
with StringIO(example) as f:
prev_n = 0
prev = ''
a = 0
error = 0
for line in f:
if 'ACTIVE' in line:
if prev == 'PRECHARGE':
a = 1
prev = 'ACTIVE'
elif 'PRECHARGE' in line:
if prev == 'ACTIVE':
error = 1
prev = 'PRECHARGE'
else:
prev = ''
print(a, error)
However, I'm not so sure you'd want lines 11-13 to count as both error
and a
. But you'll have to be more specific and provide an example with expected outcome to clarify.
It's also not clear if you really only meant to count consecutive lines, or if you expect a
and error
to have higher values in the given example because you just want to ignore 'OTHER'
lines (i.e. lines with neither ACTIVE or PRECHARGE).
And finally, it's unclear if both terms could appear on the same line and what the expected behaviour is then.
CodePudding user response:
You can use this way:
import re
lines = """
--------------------------------------------------------
--------------------------------------------------------
|1| | PRECHARGE |
|2| | ACTIVE |
|3| | ACTIVE |
|4| | PRECHARGE |
|5| | READING |
|6| | READING |
|1| | PRECHARGE |
|2| | ACTIVE |
|3| | ACTIVE |
|1| | PRECHARGE |
"""
A = 0
error = 0
A = len(re.findall(r"PRECHARGE.*\n.*ACTIVE", lines))
error = len(re.findall(r"\n.*ACTIVE.*\n.*PRECHARGE", lines))
# You can use this as well: error = len(re.findall(r"ACTIVE.*\n.*PRECHARGE", lines))
print(A)
print(error)
Output:
2
2