I have a block of text in which I try to find lines that contain any (*) number of digits 0 and at least one ( ) digit 1. Explaination:
1234 xxx 00000000000111000000 00000010000100000000 Some text <-- matches
2345 yyy 00000000000000000000 00000000000000000000 Some text <-- does not match
2345 yyy 00000001000000000000 00000000000000000000 Some text <-- matches
3456 zzz 11111111111111111111 11111111111111111111 Some text <-- matches
How to accomplish this? Thanks!
Tried with negative lookahead but failed:
\s \d [a-zA-Z] (?![0]{20}) (?![0]{20}) ([0-9a-zA-Z ] )
CodePudding user response:
You are not matching any digits 0 or 1 after the assertions.
If both columns with the digits 0 or 1 can not be only zeroes, you can use both columns in the assertion:
\d [a-zA-Z] (?!0{20} 0{20}\b)[01]{20} [01]{20} ([0-9a-zA-Z ] )
See a regex101 demo.
CodePudding user response:
Here is my shorter version of the regex. But it only test line by line. So you will have to iterate through each line in your file like the code below:
import re
text = '''1234 xxx 00000000000111000000 00000010000100000000 Some text
2345 yyy 00000000000000000000 00000000000000000000 Some text
2345 yyy 00000001000000000000 00000000000000000000 Some text
3456 zzz 11111111111111111111 11111111111111111111 Some text'''
regex = r'^\d \s \w \s 0*1 0*\s \d \s \w '
matches = re.findall(regex, text, re.MULTILINE)
for match in matches:
print(match)
For explanation and details, please check regex101 demo