I have a template that I need to replace a part of that using Regex in Python. Here is my template: (Note that there is at least a new line between two comments)
hello
how's everything
<!--POSTS:START-->
some text
<!--POSTS:END-->
Some code here
I want to replace everything between <!--POSTS:START-->
and <!--POSTS:END-->
in Python. So I made <!--POSTS:START-->\n([^;]*)\n<!--POSTS:END-->
pattern but it includes <!--POSTS:START-->
and <!--POSTS:END-->
too.
Here is what I want:
re.sub('...', 'foo', message)
# expected result:
hello
how's everything
<!--POSTS:START-->
foo
<!--POSTS:END-->
Some code here
Thanks.
CodePudding user response:
You can use a capture group for the start and end markers and reference those as \1, \2, etc in the target replacement string.
If the text has multiple occurrences of <!--POSTS:START-->...<!--POSTS:END-->
then the regexp with .*?
will replace each of those groups. If the '?' is removed the regexp then it will remove all text from the start of the first group to the end of the last group.
Try this:
import re
s = '''
hello
how's everything
<!--POSTS:START-->
some text
<!--POSTS:END-->
Some code here
'''
# for multi-line matching need extra flags in the regexp
s = re.sub(r'(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s, flags=re.DOTALL)
# this inlines the DOTALL flag in the regexp for same result
# s = re.sub(r'(?s)(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s)
print(s)
Output:
hello
how's everything
<!--POSTS:START-->
foo
<!--POSTS:END-->
Some code here
CodePudding user response:
check this https://docs.python.org/3/library/re.html
import re
pattern = r"(<!--POSTS:START-->\n).*(\n<!--POSTS:END-->)"
string = """hello
how's everything
<!--POSTS:START-->
some text
<!--POSTS:END-->
Some code here"""
result = re.sub(pattern, r"\g<1>foo\g<2>", string)
print(result)
result:
hello
how's everything
<!--POSTS:START-->
foo
<!--POSTS:END-->
Some code here
CodePudding user response:
you can use the following:
import re
new_content = re.sub(
r'(<!--POSTS:START-->\n).*?(?=\n<!--POSTS:END-->)', r"\1foo",
content, flags=re.DOTALL)
The flags DOTALL: Make the '.' special character matches any character at all, including a newline.
I'm using two things to do what you want
- Group lookahead
"?="
: Asserts that the given subpattern can be matched here, without consuming characters - Non greedy match pattern (*?). This will match in a non greedy mode. This way we get all patterns separatly
As we are using lookahead, \n<!--POSTS:END-->
will not be consumed so I only need to keep the first group and rewrite the content between the matches. That is why I'm using \1foo
and not \1foo\2
If you need to modify only the first match you can use count=1
re.sub(..., count=1)
You can have anything between those two lines and it will work as expected