string= "This is a sentence. Micky Mouse"
name= re.compile(f"\.?Micky Mouse")
name_match = name.search(string)
print(name_match)
I want to ensure that a match is only provided if "Micky Mouse" is at the beginning of a new sentence, i.e., only if it follows on a dot "."
However, there should also be a match irrespective of any new lines or spacings between "Micky Mouse" and the end of the previous sentence. So the following expression should also provide a match print("This is a sentence. \nMicky Mouse")
CodePudding user response:
You can match optional whitespace chars after the dot:
\.\s*Micky Mouse\b
The pattern matches:
\.\s*
Match a dot and optional whitespace chars (that can also match a newline)Micky Mouse\b
Match literally followed by a word boundary
CodePudding user response:
The \s
flag matches for all whitespace characters including \n
.
Something like the following should do the trick
re.compile(".\s?Mickey Mouse")
CodePudding user response:
In order to be at the beginning of a sentence, and ignore any whitespace differences after it, prepend the match target with (?:^|\.)\s*
.
(?:)
-> it doesn't create a group^|\.
-> either the beginning of the String^
or|
a literal dot\.
\s*
-> any amount of whitespace, including newlines, spaces, tabs, etc.
import re
string= """This is a sentence. Micky Mouse.
Micky Mouse again. No Micky Mouse match here."""
pattern = re.compile(f"(?:^|\.)\s*Micky Mouse")
name_match = re.finditer(pattern, string)
print([match.group(0) for match in name_match])
output:
['. Micky Mouse', '. \n Micky Mouse']