Home > front end >  How do I grep the sequence line by line in Python?
How do I grep the sequence line by line in Python?

Time:09-07

I am writing a Python program to do page statistics calculations. I need to grep the word sequence by sequence. If the word 'PRECHARGE' appears before 'ACTIVE', i.e. from Line 1 to 2, the value of A will be equal to one as shown in the script below. If the word 'PRECHARGE' appears after 'ACTIVE', i.e. from Line 3 to 4, the value of error will be equal to 1.

So in this example, I should have A = 1 and Error = 1. When I print the value of A and Error, respectively, they are showing A = 2, which is untrue. How to resolve this logical error?

import re
A = 0
error = 0
lines = open("page_stats_ver2.txt", "r").readlines()
for line in lines:
    if re.search(r"ACTIVE", line):
        if re.search(r"PRECHARGE", line):
            error  = 1
        else:
            A  = 1
print(A)
print(error)

Text File

CodePudding user response:

Taking you specification literally:

from io import StringIO

example = """
 1  ACTIVE
 2  PRECHARGE
 3  OTHER
 4  PRECHARGE
 5  ACTIVE
 6  OTHER
 7  ACTIVE
 8  OTHER
 9  PRECHARGE
10  OTHER
11  ACTIVE
12  PRECHARGE
13  ACTIVE
"""
# in the above A would be 2 (lines 4 & 12)
# and error also 2 (lines 2 & 12)
# none of the other lines count, since there's always an 'OTHER'
# and 12 counts for both because it has ACTIVE both before and after
# (that's probably not exactly what you need, but definitely what you specified)

# using StringIO instead of open("somefile.txt"), to get a file-like object
with StringIO(example) as f:
    prev_n = 0
    prev = ''
    a = 0
    error = 0
    for line in f:
        if 'ACTIVE' in line:
            if prev == 'PRECHARGE':
                a  = 1
            prev = 'ACTIVE'
        elif 'PRECHARGE' in line:
            if prev == 'ACTIVE':
                error  = 1
            prev = 'PRECHARGE'
        else:
            prev = ''

print(a, error)

However, I'm not so sure you'd want lines 11-13 to count as both error and a. But you'll have to be more specific and provide an example with expected outcome to clarify.

It's also not clear if you really only meant to count consecutive lines, or if you expect a and error to have higher values in the given example because you just want to ignore 'OTHER' lines (i.e. lines with neither ACTIVE or PRECHARGE).

And finally, it's unclear if both terms could appear on the same line and what the expected behaviour is then.

CodePudding user response:

You can use this way:

import re

lines = """
--------------------------------------------------------

--------------------------------------------------------
|1|   | PRECHARGE |
|2|   | ACTIVE    |
|3|   | ACTIVE    |
|4|   | PRECHARGE |
|5|   | READING   |
|6|   | READING   |
|1|   | PRECHARGE |
|2|   | ACTIVE    |
|3|   | ACTIVE    |
|1|   | PRECHARGE |
"""

A = 0
error = 0
A = len(re.findall(r"PRECHARGE.*\n.*ACTIVE", lines))
    
error = len(re.findall(r"\n.*ACTIVE.*\n.*PRECHARGE", lines))
# You can use this as well: error = len(re.findall(r"ACTIVE.*\n.*PRECHARGE", lines))    
        
            
print(A)
print(error)

Output:

2
2
  • Related