Regex pattern to catch 'ID' in text blocks without 'FAIL' status (regex only)-CodePudding

I need to write a regular expression that will catch ids in the CLPM_FPGA_id substring if there is no FAIL status in the text block. The solution should have regex pattern only (split and other python methods are not allowed).

import re

string = """PASS: PLL Lock signal state at reset is 0.
PASS: PLL Lock signal state is 1.
PASS: PLL Locked within expected time: 5000000 ps.
Requirements verified: CLPM_FPGA_31

PASS: System Clock high period is 8000 ps.
FAIL: System Clock low period is 8000 ps.
PASS: System Clock period is 16000 ps.
Requirements verified: CLPM_FPGA_32

PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_33

PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_34

FAIL: System Clock low period is 8000 ps.
PASS: System Clock high period is 8000 ps.
PASS: System Clock period is 16000 ps.
Requirements verified: CLPM_FPGA_35

PASS: System Clock high period is 8000 ps.
PASS: System Clock period is 16000 ps.
FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_36

FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_37

PASS: System Clock to IFC delay is 10 ps.
FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_38

PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_39"""

re_pattern = re.compile("(?:(?<=^)|(?<=\n{2}))PASS. ?(?!FAIL). ?CLPM_FPGA_(\d{2})", re.I|re.S)

found = re_pattern.findall(string)

found

My solution returns

['31', '32', '33', '34', '36', '38', '39']

Do you have any suggestions how to omit the text blocks with FAIL status from regex pattern results?

CodePudding user response：

You might use:

(?:\A|\n{2})PASS:.*(?:\nPASS:.*)*\n(?!FAIL:).*CLPM_FPGA_(\d )

(?:\A|\n{2}) Match either the start of the string or 2 newlines
PASS:.* Match PASS: and the rest of the line
(?:\nPASS:.*)* Optionally repeat all lines that start with PASS:
\n Match a newline
(?!FAIL:) Assert not FAIL: directly to the right
.*CLPM_FPGA_(\d ) Match the line with CLPM_FPGA_ and capture 1 digits in group 1

Regex demo

re_pattern = re.compile("(?:\A|\n{2})PASS:.*(?:\nPASS:.*)*\n(?!FAIL:).*CLPM_FPGA_(\d )", re.I)
found = re_pattern.findall(string)

print(found)

Output

['31', '33', '34', '39']

CodePudding user response：

The following approach using re.findall seemed to work for me:

matches = re.findall(r'(?:\n{2,}|^)PASS:.*?(?:\nPASS:.*?)*?\nRequirements verified: CLPM_FPGA_(\d )', string)
print(matches)  # ['31', '33', '34', '39']

The strategy of this regex is to match blocks, indicated by a starting PASS which itself is preceded by either two spaces or the start of the string. It then matches only subsequent PASS lines, followed by the requirements line. We capture the ID which then appears in the output list.