Home > Back-end >  Python Regex: Match start of line, or semi-colon, or start of string, none capturing group
Python Regex: Match start of line, or semi-colon, or start of string, none capturing group

Time:02-23

I am trying to create a Python regex that matches either:

  1. The start of the string
  2. The start of a line
  3. Or a semi colon

Up to here it must be a none consuming of the string. Once the above is found, then it looks for optional whitespace and then the word import (which is captured).

The regex (with the mutline and global flags (mg)):

(?<=^|;)\s*(import)

Fails because a look behind must be fixed width in python:

import sadfsda; import asdf sdaf 
import asdfas dfasdf

Note: [^] does answer the question, because how do you specify the a fixed width look behind that matches my requirements. This was just one of the many attempts that didnt work.

CodePudding user response:

You can use

(?<![^;\n])\s*(import)

See the regex demo. Details:

  • (?<![^;\n]) - a negative lookbehind that matches a location that is not immediately preceded with a ; and a newline (LF, line feed) char
  • \s* - zero or more whitespaces
  • (import) - Group 1: import string.

The \n line feed matching pattern is necessary since negated character classes match newlines if \n is not part of the class, so add it to the negated character class whenever you want to match positions at the start of any line with a negative lookbehind like this.

See the Python demo:

import re
s='''import sadfsda; import asdf sdaf 
import asdfas dfasdf'''
rx = r'(?<![^;\n])\s*(import)'
print( re.sub(rx, r'{{\g<0>}}', s) )

Output:

{{import}} sadfsda;{{ import}} asdf sdaf 
{{import}} asdfas dfasdf
  • Related