I have this regex defined in python:
multiline_comment_regex = r'(^[ \t]*<#[^>]*#>[ \t]*[\n]*)'
And the testing string:
characters = 'asdfasdfñáéíóú\n\r\t <#somecomment \n\r multiline\t\n\r\t asd#>\nasdf\n #comment'
In regex101.com, the regex works as a charm and matches:
'\t <#somecomment \n\r multiline\t\n\r\t asd#>\n'
But, using pandas
, it doesn't match anything:
data = pd.DataFrame({'process': [characters, ]})
data['process'].replace({multiline_comment_regex: ''}, regex=True, inplace=True)
Neither with re
:
re.match(multiline_comment_regex, characters)
What is wrong with the regex?
Thank you!
CodePudding user response:
You need to account for two things here:
- The
^
matches start of a whole string position, you need to use a multiline flag, and in this case, an inline(?m)
option looks convenient to use - The line endings seem to be CRLF here, so you can't just use
\n
, it makes sense to match any whitespaces at the start/end of the pattern.
The following pattern should work:
(?m)^(\s*<#[^>]*#>\s*)
See a regex test in the environment with CRLF line endings.