I need to write a regular expression that will catch ids
in the CLPM_FPGA_id
substring if there is no FAIL
status in the text block.
The solution should have regex pattern only
(split and other python methods are not allowed).
import re
string = """PASS: PLL Lock signal state at reset is 0.
PASS: PLL Lock signal state is 1.
PASS: PLL Locked within expected time: 5000000 ps.
Requirements verified: CLPM_FPGA_31
PASS: System Clock high period is 8000 ps.
FAIL: System Clock low period is 8000 ps.
PASS: System Clock period is 16000 ps.
Requirements verified: CLPM_FPGA_32
PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_33
PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_34
FAIL: System Clock low period is 8000 ps.
PASS: System Clock high period is 8000 ps.
PASS: System Clock period is 16000 ps.
Requirements verified: CLPM_FPGA_35
PASS: System Clock high period is 8000 ps.
PASS: System Clock period is 16000 ps.
FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_36
FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_37
PASS: System Clock to IFC delay is 10 ps.
FAIL: System Clock low period is 8000 ps.
Requirements verified: CLPM_FPGA_38
PASS: System Clock to IFC delay is 10 ps.
Requirements verified: CLPM_FPGA_39"""
re_pattern = re.compile("(?:(?<=^)|(?<=\n{2}))PASS. ?(?!FAIL). ?CLPM_FPGA_(\d{2})", re.I|re.S)
found = re_pattern.findall(string)
found
My solution returns
['31', '32', '33', '34', '36', '38', '39']
Do you have any suggestions how to omit the text blocks with FAIL
status from regex pattern results?
CodePudding user response:
You might use:
(?:\A|\n{2})PASS:.*(?:\nPASS:.*)*\n(?!FAIL:).*CLPM_FPGA_(\d )
(?:\A|\n{2})
Match either the start of the string or 2 newlinesPASS:.*
MatchPASS:
and the rest of the line(?:\nPASS:.*)*
Optionally repeat all lines that start with PASS:\n
Match a newline(?!FAIL:)
Assert not FAIL: directly to the right.*CLPM_FPGA_(\d )
Match the line with CLPM_FPGA_ and capture 1 digits in group 1
re_pattern = re.compile("(?:\A|\n{2})PASS:.*(?:\nPASS:.*)*\n(?!FAIL:).*CLPM_FPGA_(\d )", re.I)
found = re_pattern.findall(string)
print(found)
Output
['31', '33', '34', '39']
CodePudding user response:
The following approach using re.findall
seemed to work for me:
matches = re.findall(r'(?:\n{2,}|^)PASS:.*?(?:\nPASS:.*?)*?\nRequirements verified: CLPM_FPGA_(\d )', string)
print(matches) # ['31', '33', '34', '39']
The strategy of this regex is to match blocks, indicated by a starting PASS
which itself is preceded by either two spaces or the start of the string. It then matches only subsequent PASS
lines, followed by the requirements line. We capture the ID which then appears in the output list.