Home > Software engineering >  A certain regular expression that should match does not match in Python
A certain regular expression that should match does not match in Python

Time:11-25

I am working with determining if certain regular expressions apply to some specified text, and for this I wrote a short Python script. I am having trouble with a certain regular expression because I tested it in an app on my iPhone designed to test regular expressions on specified text, and the regular expression matches the text in the app. But when I try the expression on the text in a Python script, there is no match. I am pasting below a short Python script that tests the regular expression on the desired text and a photo of the regular expression app that shows that the regular expression does match the text. What I would like, if possible, is to get an explanation as to why the regular expression does not match the text in Python. Any help would be greatly appreciated. Thanks so much.

# -*- coding:utf-8 -*-

import regex as re

expression = r'((?<=(^|\n)[\w [[:punct:]]]{1,100})(?<!Chapter[ \t]{1,100}[0-9]{1,100})(?<!\w{2}—[\w [[:punct:]]]{1,100})—(?![a-z]))'

text = r'Section 1—From Strength to Weakness'

replacedText, numMatches = re.subn(r'('   expression   r')', r'<mark>\1</mark>', text)
print('Number of matches: '   str(numMatches)   '\n'   replacedText)

Photo of RegEx app that shows match

CodePudding user response:

What I would like, if possible, is to get an explanation as to why the regular expression does not match the text in Python

The problem is that the [[:punct:]] character class appears inside a character class. You need to stick to the brackets that [[:punct:]] already has, and add the other characters inside that notation. In other words, regard that [[:punct:]] notation as a character class notation, with [:punct:] appearing in it. There should not be another pair of brackets.

So write [\w [:punct:]] instead of [\w [[:punct:]]], ...etc.

Here is the correction:

expression = r'((?<=(^|\n)[\w [:punct:]]{1,100})(?<!Chapter[ \t]{1,100}[0-9]{1,100})(?<!\w{2}—[\w [:punct:]]{1,100})—(?![a-z]))'
  • Related