I would like to change all words that occur after from/on to 1_ only that occurs after from or on
input
with crossroad
from crossroad
(on pike)
expected output
with crossroad
from (1_crossroad)
(on (1_pike))
Code I tried :
rgxsubtable = re.compile(r"(?:from|on)[\s] ([\w\d.\"] )",re.MULTILINE|re.IGNORECASE) # find the occurance to change
tlist = set(rgxsubtable.findall(input))
for item in tlist:
input = re.sub(r"(?!\B\w){0}(?<!\w\B)".format(re.escape(item)),"(1_{0})".format(item),input )
This would replace both crossroads instead of only crossroad after "from" which I know. But I don't know how to selectively replace the word after from/on only
output obtained
with (1_crossroad)
from (1_crossroad)
(on (1_pike))
CodePudding user response:
My solution to this would be the following:
import re
from typing import List
string = """with crossroad
from crossroad
(on pike)"""
exclusion_list: List[str] = ["\n", "\\)"]
string = re.sub(fr"from ([^{''.join(exclusion_list)}]*)", r"from (1_\g<1>)", string)
string = re.sub(fr"on ([^{''.join(exclusion_list)}]*)", r"on (1_\g<1>)", string)
print(string)
Output:
with crossroad
from (1_crossroad)
(on (1_pike))
This assumes there is a \n
character between lines, to capture the whole expression after the keyword. In this code, each re.sub
call will replace every occurrences of one case, either from XXX
or on XXX
.
Additionally, pay attention this works in this case, but might break in other cases, for instance if you had [on pike]
, the resulting line would be [on (pike])
. You might want to add some characters to the exclusion list.
The exclusion list is then added to the pattern by using a formatted (f
) raw (r
) string. This will capture everything on the line until one of the excluded characters are present.
This has one major consequence, the characters in the exclusion list need to be properly escaped to achieve your goal. For instance, if you wanted to capture only the first word after from
and on
, you would want to add the space as an excluded character. For this, the pattern itself would need \s
to be added, thus we would need to add \\s
in the list (we need to double escape in order for a single escape to be present in the pattern).
Finally, we are here using the same exclusion list for both cases, you obviously can use two different lists.
CodePudding user response:
You can use a positive look behind to match from
and on
without including them in the answer.
This Regex (?<=(from|on) )([^\s] )
, will match any string after from
and on
such that you can replace it.
You can see it in action here