Suppose I have the following text:
Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.
I need to match every string within the «» quotes but ONLY in the period starting with the "Products to be destroyed:" pattern or ending with the (Rule) pattern.
In other words in this example I do NOT want to match Dilora nor Apple.
The regex to get the quoted contents in the capturing group is:
«(. ?)»
Is it possible to "anchor" it to either a following pattern (such as Rule) or even to a prior pattern (such as "Products to be destroyed:"?
This is my saved attempt on regex101
Thank you very much.
CodePudding user response:
You can match at least a single part between the arrows, and when there is a match, extract all the parts using re.findall for example.
The example data seems to be within a dot. In that case you can match at least a single arrow part matching any char except a dot using a negated character class.
Regex demo for at least a single match, and another demo to match the separate parts afterwards
import re
regex = r"\bProducts to be destroyed:[^.]*«[^«»]*»[^.]*\."
s = 'Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.'
result = re.search(regex, s)
if result:
print(re.findall(r"«([^«»]*)»", result.group()))
Output
['Prabo', 'Palox 2000', 'Remadon strong']