Home > database >  How to capture everything from the beginning of a string until every occurrence of a specific string
How to capture everything from the beginning of a string until every occurrence of a specific string

Time:06-15

How can one capture everything from the beginning of a string until every occurrence of a specific string/pattern using regular expressions in Python?

So, for example, if I have a string like the following, and I want to catch everything until every occurrence of `"UNTIL":

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

Then the outputs are supposed to be as the follows:

[
  "Here's some text ",
  "Here's some text UNTIL for the 1st time, then some more text ",
  "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text ",
]

What I could figure out already is this:

import re

re.findall(r'. ?(?=UNTIL)', txt)
# Output
[
  "Here's some text ",
  "UNTIL for the 1st time, then some more text ",
  "UNTIL for the 2nd time, and finally more text ",
]

But the result is not exactly what I need to achieve. I know I could solve this programmatically, but I am working with relatively large files, so I would be glad to solve it with only regular expressions.

Is there a way to achieve this? And if so, how?

CodePudding user response:

Solution 1

The regex you're looking for is (?:\b|^)(?=UNTIL(?=.*UNTIL))

import re

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

res = re.split(r"(?:\b|^)(?=UNTIL(?=.*UNTIL))", txt)

Solution 2

The best thing you could do here with . ?(?=UNTIL) is to convert the result of re.findall(r'. ?(?=UNTIL)', txt) to the expected format.

import re

txt = "Here's some text UNTIL for the 1st time, then some more text UNTIL for the 2nd time, and finally more text UNTIL the 3rd time."

arr = re.findall(r'. ?(?=UNTIL)', txt)
res = [''.join(arr[:i 1]) for i in range(len(arr))]
  • Related