Home > Net >  How do I get preceding and following lines when a specific string is contained in a line? (Python an
How do I get preceding and following lines when a specific string is contained in a line? (Python an

Time:01-13

Sample of text file in a directory:

text1 = "
SomethingAAA SomethingAAA SomethingAAA SomethingAAA 
SomethingBBB SomethingBBB SomethingBBB SomethingBBB 
SomethingCCC SomethingCCC SomethingCCC SomethingCCC 
SomethingDDD SomethingDDD SomethingDDD SomethingDDD 
BlahBlah SomethingXXX BlahBlah BlahBlah BlahBlah BlahBlah 
SomethingEEE SomethingEEE SomethingEEE SomethingEEE 
SomethingFFF SomethingFFF SomethingFFF SomethingFFF 
SomethingGGG SomethingGGG SomethingGGG SomethingGGG 
BlahBlah BlahBlah BlahBlah SomethingYYY BlahBlah
SomethingGGG SomethingGGG SomethingGGG SomethingGGG 
"

I have two regex patterns I use to identify the strings in the texts:

pattern1 = re.compile(r'\w XXX')
pattern2 = re.compile(r'\w YYY')

The goal is to save the lines containing the patterns plus the preceding line and following line in a new text file.

So the desired output would be:

newtext = "
SomethingDDD SomethingDDD SomethingDDD SomethingDDD 
BlahBlah SomethingXXX BlahBlah BlahBlah BlahBlah BlahBlah 
SomethingEEE SomethingEEE SomethingEEE SomethingEEE 

SomethingGGG SomethingGGG SomethingGGG SomethingGGG 
BlahBlah BlahBlah BlahBlah SomethingYYY BlahBlah
SomethingGGG SomethingGGG SomethingGGG SomethingGGG 
"

What I'm doing now is:

relevant piece of code:

previous_line = deque()
for text_doc in text_docs:

    with open(text_doc,'r') as f:
        for line in f:
            nextline = next(f).strip()
            prev_line.appendleft(line)
            with open(
                output, "a"                
            ) as Results:
                 if re.search(pattern1, line):
                    previous_line = "".join(previous_line.popleft())
                    found_pattern1 = previous_line   line   nextline
                    Results.write(f"\n\nInstance of pattern1: \n{found_pattern1}\n\n")
                elif re.search(pattern2, line):                    
                    previous_line = "".join(previous_line.popleft())
                    found_pattern2 = previous_line   line   nextline
                    Results.write(f"\n\nInstance of pattern2: \n{found_pattern2}\n\n")
  
            prev_line.clear()

what I'm getting, however, is:

newtext = "
BlahBlah SomethingXXX BlahBlah BlahBlah BlahBlah BlahBlah 
BlahBlah SomethingXXX BlahBlah BlahBlah BlahBlah BlahBlah 
SomethingEEE SomethingEEE SomethingEEE SomethingEEE 

BlahBlah BlahBlah BlahBlah SomethingYYY BlahBlah
BlahBlah BlahBlah BlahBlah SomethingYYY BlahBlah
SomethingGGG SomethingGGG SomethingGGG SomethingGGG"

What is it that I'm doing wrong and what do I have to change to achieve my goal?

CodePudding user response:

You can join pattern1 and pattern2 with an alternation pattern and include the preceding and following lines with (?:.*\n)?, and use re.findall to find all matches:

patterns = [r'\w XXX', r'\w YYY']
new_text = '\n'.join(re.findall(rf"(?:.*\n)?(?:.*(?:{'|'.join(patterns)}).*\n) (?:.*\n)?", text1))

Demo: https://replit.com/@blhsing/RosybrownGargantuanAnalysts

CodePudding user response:

I managed to find a solution. Here it is in case it can help somebody:

output_lines = deque(maxlen=3)
for text_doc in text_docs:
    print(text_doc)
    with open(text_doc, 'r') as f:
        lines = f.readlines()
        for i, line in enumerate(lines):  
            if re.search(Pattern1, line) or re.search(Pattern2, line):
                try:
                    output_lines.extend([lines[i-1], line, lines[i 1]])
                except StopIteration:
                    pass #EOF

                with open(
                    output.txt, "a"                
                ) as Results:
                    complete_output = ''.join(output_lines)
                    Results.writelines('============================================\n')
                    Results.writelines(f"\nLines with pattern: \n{complete_output}\n\n")
                    Results.writelines('============================================\n')

  • Related