I'm trying to write a regex pattern that will fail a match if the preceding pattern contains any character except pure whitespace, for example
--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
--comment with some indentation (match)
--another comment with some indentation (match)
The closest I've got to is with this pattern I made (?<!.)--.*\n
, that gives me this result
--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
--comment with some indentation (fail)
--another comment with some indentation (fail)
I've tried (?<!\s)--.*\n
and (?<=\S)--.*\n
but both return no matches at all
EDIT: a regexr.com illustrating the issue more clearly regexr.com/6j0mt
CodePudding user response:
With PyPi regex
, you can use
import regex
text = r"""--hello
--goodbye
ROW_NUMBER() OVER (ORDER BY DATE) --date
--comment with some indentation
--another comment with some indentation"""
print( regex.findall(r'(?<=^[^\S\r\n]*)--.*', text, regex.M) )
# => ['--hello', '--goodbye', '--comment with some indentation', '--another comment with some indentation']
See this Python demo online.
Or, with the default Python re
:
import re
text = r"""--hello
--goodbye
ROW_NUMBER() OVER (ORDER BY DATE) --date
--comment with some indentation
--another comment with some indentation"""
print( re.findall(r'^[^\S\r\n]*(--.*)', text, re.M) )
See this Python demo.
Pattern details
(?<=^[^\S\r\n]*)
- a positive lookbehind that matches a location that is immediately preceded with start of string/line and zero or more horizontal whitespaces^
- start of a string (here, a line, becausere.M
/regex.M
option is used)[^\S\r\n]*
- zero or more chars other than non-whitespace, CR and LF chars (any whitespace but carriage returns and line feed chars)(--.*)
- Group 1:--
and the rest of the line (.*
matches zero or more chars other than line break chars as many as possible).