Home > database >  PowerShell multiline comment regex works in regex101 but not in pandas or re
PowerShell multiline comment regex works in regex101 but not in pandas or re

Time:05-06

I have this regex defined in python:

multiline_comment_regex = r'(^[ \t]*<#[^>]*#>[ \t]*[\n]*)'

And the testing string:

characters = 'asdfasdfñáéíóú\n\r\t  <#somecomment \n\r multiline\t\n\r\t asd#>\nasdf\n  #comment'

In regex101.com, the regex works as a charm and matches:

'\t  <#somecomment \n\r multiline\t\n\r\t asd#>\n'

But, using pandas, it doesn't match anything:

data = pd.DataFrame({'process': [characters, ]})
data['process'].replace({multiline_comment_regex: ''}, regex=True, inplace=True)

Neither with re:

re.match(multiline_comment_regex, characters)

What is wrong with the regex?

Thank you!

CodePudding user response:

You need to account for two things here:

  • The ^ matches start of a whole string position, you need to use a multiline flag, and in this case, an inline (?m) option looks convenient to use
  • The line endings seem to be CRLF here, so you can't just use \n, it makes sense to match any whitespaces at the start/end of the pattern.

The following pattern should work:

(?m)^(\s*<#[^>]*#>\s*)

See a regex test in the environment with CRLF line endings.

  • Related