Home > Back-end >  Regex: Provide match for beginning of a sentence ignoring new lines
Regex: Provide match for beginning of a sentence ignoring new lines

Time:10-11

string= "This is a sentence. Micky Mouse"

name= re.compile(f"\.?Micky Mouse")
name_match = name.search(string)
print(name_match)

I want to ensure that a match is only provided if "Micky Mouse" is at the beginning of a new sentence, i.e., only if it follows on a dot "." However, there should also be a match irrespective of any new lines or spacings between "Micky Mouse" and the end of the previous sentence. So the following expression should also provide a match print("This is a sentence. \nMicky Mouse")

CodePudding user response:

You can match optional whitespace chars after the dot:

\.\s*Micky Mouse\b

The pattern matches:

  • \.\s* Match a dot and optional whitespace chars (that can also match a newline)
  • Micky Mouse\b Match literally followed by a word boundary

Regex demo

CodePudding user response:

The \s flag matches for all whitespace characters including \n.

Something like the following should do the trick

re.compile(".\s?Mickey Mouse")

CodePudding user response:

In order to be at the beginning of a sentence, and ignore any whitespace differences after it, prepend the match target with (?:^|\.)\s*.

  • (?:) -> it doesn't create a group
  • ^|\. -> either the beginning of the String ^ or | a literal dot \.
  • \s* -> any amount of whitespace, including newlines, spaces, tabs, etc.
import re

string= """This is a sentence. Micky Mouse. 
           Micky Mouse again. No Micky Mouse match here."""

pattern = re.compile(f"(?:^|\.)\s*Micky Mouse")
name_match = re.finditer(pattern, string)
print([match.group(0) for match in name_match])

output:

['. Micky Mouse', '. \n           Micky Mouse']
  • Related