Appending a text at the end of bullet point-CodePudding

I have a multiline string of the following form:

Front

(A) Text1.

(A) Text2.

(A) Text3.

(A) Text4.

(A) Text5.

End

Note that Text1, Text2 etc may contain line breaks. I wish to append the string END after each of Text1, Text2 etc.

Let c denote the multiline string above. I tried to use regex re.sub to perform this:

c = re.sub("\(A\)(.*?)\n\n\(A\)" , r"(A)\1 END\n\n(A)", c, flags=re.DOTALL)

However, this only replaces every odd-numbered point. Here is the output:

Front

(A) Text1. END

(A) Text2.

(A) Text3. END

(A) Text4.

(A) Text5.

End

The last bullet point can be handled as an exception case. I'm more concerned with that only every other bullet point has END appended at the end. I believe this is because when the second (A) is used as the endpoint of re.sub, Python excludes it from being a starting point.

How can I resolve this?

CodePudding user response：

Python's regular expressions support lookahead, which is good for your use case:

c = re.sub("\(A\)(.*?)\n\n(?=\(A\))" , r"(A)\1 END\n\n", c, flags=re.DOTALL)

A lookahead, denoted by (?=), matches the enclosed pattern but does not include it in the matched span (it is a zero-width match).

Sample:

import re

c = """Front

(A) Text1.
Foo.
Bar.

(A) Text2.
Some extra text and a fake bullet (A)
More text

(A) Text3.

(A) Text4.

(A) Text5.

End"""

c = re.sub("\(A\)(.*?)\n\n(?=\(A\))" , r"(A)\1 END\n\n", c, flags=re.DOTALL)

print(c)

prints

Front

(A) Text1.
Foo.
Bar. END

(A) Text2.
Some extra text and a fake bullet (A)
More text END

(A) Text3. END

(A) Text4. END

(A) Text5.

End

CodePudding user response：

The regex that I used to select lines starting with (A):

r"\(A\).*"

I then used a custom replacement function to return the original line " END" at the end.
Here is the code:

Code:

import re

c = """Front

(A) Text1.

(A) Text2.

(A) Text3.

(A) Text4.

(A) Text5.

End"""

def rep(m):
    return m.group(0)   " END"

c = re.sub(r"\(A\).*", repl=rep, string=c)

print(c)

Output:

Front

(A) Text1. END

(A) Text2. END

(A) Text3. END

(A) Text4. END

(A) Text5. END

End

CodePudding user response：

You can modify your regex pattern to use LookAhead and Lookbehind which are zero-width (i.e. do not consume characters) to get around your issue of:

I believe this is because when the second (A) is used as the endpoint of re.sub"

c = re.sub("(?<=\(A\))(.*?)(?=\n\n\(A\)|\n\nEnd)" , r"\1 END", c, flags=re.DOTALL)

print(c)

Output

Front

(A) Text1. END

(A) Text2. END

(A) Text3. END

(A) Text4. END

(A) Text5. END

End