How can I replace a substring between page1/
and _type-A
with 222.6
in the below-provided l
string?
l = 'https://homepage.com/home/page1/222.6 a_type-A/go'
replace_with = '222.6'
Expected result:
https://homepage.com/home/page1/222.6_type-A/go
I tried:
import re
re.sub('page1/.*?_type-A','',l, flags=re.DOTALL)
But it also removes page1/
and _type-A
.
CodePudding user response:
You can use
import re
l = 'https://' 'homepage.com/home/page1/222.6 a_type-A/go'
replace_with = '222.6'
print (re.sub('(page1/).*?(_type-A)',fr'\g<1>{replace_with}\2',l, flags=re.DOTALL))
Output: https://homepage.com/home/page1/222.6_type-A/go
See the Python demo online
Note you used an empty string as the replacement argument. In the above snippet, the parts before and after .*?
are captured and \g<1>
refers to the first group value, and \2
refers to the second group value from the replacement pattern. The unambiguous backreference form (\g<X>
) is used to avoid backreference issues since there is a digit right after the backreference.
Since the replacement pattern contains no backslashes, there is no need preprocessing (escaping) anything in it.
CodePudding user response:
You may use re.sub
like this:
import re
l = 'https://homepage.com/home/page1/222.6 a_type-A/go'
replace_with = '222.6'
print (re.sub(r'(?<=page1/).*?(?=_type-A)', replace_with, l))
Output:
https://homepage.com/home/page1/222.6_type-A/go
RegEx Breakup:
(?<=page1/)
: Lookbehind to assert that we havepage1/
at previous position.*?
: Match 0 or more of any string (lazy)(?=_type-A)
: Lookahead to assert that we have_type-A
at next position
CodePudding user response:
This works:
import re
l = 'https://homepage.com/home/page1/222.6 a_type-A/go'
pattern = r"(?<=page1/).*?(?=_type)"
replace_with = '222.6'
s = re.sub(pattern, replace_with, l)
print(s)
The pattern uses the positive lookahead and lookback assertions, ?<=
and ?=
. A match only occurs if a string is preceded and followed by the assertions in the pattern, but does not consume them. Meaning that re.sub
looks for a string with page1/
in front and _type
behind it, but only replaces the part in between.