Home > OS >  Python extract repeating substring(s) between equal markers
Python extract repeating substring(s) between equal markers

Time:12-13

let's say I have a textfile as follows:

 1. MarkerOne
 Some text
 EndMarkerOne
 2. Something else
 Some more text
 EndSomethingElse
 3. MarkerTwo
 Some Text
 EndMarkerTwo

whereas MarkerOne and MarkerTwo as well as EndMarkerOne and EndMarkerTwo are the same. E.g.:

    1. Notice 
    Some text 
    End Notice
    2. Blabla 
    Some other text 
    End Blabla
    3. Notice 
    Some more text
    End Notice

Now I want to extract the "some text" and the "some more text" from the file as two different substrings in a list.

I tried:

    import re
    pattern = "\d . Notice[\S\t\n\v ]*End Notice"
    re.compile(pattern)
    result = re.findall(pattern, text)
    print(result)

Unfortunately this gives me all text between the first "Notice" and the last "End Notice" and not two separate results.

What I need is to tell the script to separate the results by each "End Notice" and start the next with finding the pattern again.

Any idea?

CodePudding user response:

Use a non-greedy regex, change * to *?, see What is the difference between .*? and .* regular expressions?

import re

ptn = re.compile(r"\d . Notice[\S\t\n\v ]*?End Notice")
result = ptn.findall(text)
  • Related