let's say I have a textfile as follows:
1. MarkerOne
Some text
EndMarkerOne
2. Something else
Some more text
EndSomethingElse
3. MarkerTwo
Some Text
EndMarkerTwo
whereas MarkerOne and MarkerTwo as well as EndMarkerOne and EndMarkerTwo are the same. E.g.:
1. Notice
Some text
End Notice
2. Blabla
Some other text
End Blabla
3. Notice
Some more text
End Notice
Now I want to extract the "some text" and the "some more text" from the file as two different substrings in a list.
I tried:
import re
pattern = "\d . Notice[\S\t\n\v ]*End Notice"
re.compile(pattern)
result = re.findall(pattern, text)
print(result)
Unfortunately this gives me all text between the first "Notice" and the last "End Notice" and not two separate results.
What I need is to tell the script to separate the results by each "End Notice" and start the next with finding the pattern again.
Any idea?
CodePudding user response:
Use a non-greedy regex, change *
to *?
, see What is the difference between .*? and .* regular expressions?
import re
ptn = re.compile(r"\d . Notice[\S\t\n\v ]*?End Notice")
result = ptn.findall(text)