Regex: Provide match for beginning of a sentence ignoring new lines-CodePudding

string= "This is a sentence. Micky Mouse"

name= re.compile(f"\.?Micky Mouse")
name_match = name.search(string)
print(name_match)

I want to ensure that a match is only provided if "Micky Mouse" is at the beginning of a new sentence, i.e., only if it follows on a dot "." However, there should also be a match irrespective of any new lines or spacings between "Micky Mouse" and the end of the previous sentence. So the following expression should also provide a match print("This is a sentence. \nMicky Mouse")

CodePudding user response：

You can match optional whitespace chars after the dot:

\.\s*Micky Mouse\b

The pattern matches:

\.\s* Match a dot and optional whitespace chars (that can also match a newline)
Micky Mouse\b Match literally followed by a word boundary

Regex demo

CodePudding user response：

The \s flag matches for all whitespace characters including \n.

Something like the following should do the trick

re.compile(".\s?Mickey Mouse")

CodePudding user response：

In order to be at the beginning of a sentence, and ignore any whitespace differences after it, prepend the match target with (?:^|\.)\s*.

(?:) -> it doesn't create a group
^|\. -> either the beginning of the String ^ or | a literal dot \.
\s* -> any amount of whitespace, including newlines, spaces, tabs, etc.

import re

string= """This is a sentence. Micky Mouse. 
           Micky Mouse again. No Micky Mouse match here."""

pattern = re.compile(f"(?:^|\.)\s*Micky Mouse")
name_match = re.finditer(pattern, string)
print([match.group(0) for match in name_match])

output:

['. Micky Mouse', '. \n           Micky Mouse']