Home > database >  Python regex replace only subgroup
Python regex replace only subgroup

Time:12-06

I would like to change all words that occur after from/on to 1_ only that occurs after from or on

input

with crossroad
from crossroad
(on pike)

expected output

with crossroad
from (1_crossroad)
(on (1_pike))

Code I tried :

rgxsubtable = re.compile(r"(?:from|on)[\s] ([\w\d.\"] )",re.MULTILINE|re.IGNORECASE) # find the occurance to change
tlist = set(rgxsubtable.findall(input))

for item in tlist:
    input = re.sub(r"(?!\B\w){0}(?<!\w\B)".format(re.escape(item)),"(1_{0})".format(item),input )

This would replace both crossroads instead of only crossroad after "from" which I know. But I don't know how to selectively replace the word after from/on only

output obtained

with (1_crossroad)
from (1_crossroad)
(on (1_pike))

CodePudding user response:

My solution to this would be the following:

import re
from typing import List

string = """with crossroad
from crossroad
(on pike)"""

exclusion_list: List[str] = ["\n", "\\)"]
string = re.sub(fr"from ([^{''.join(exclusion_list)}]*)", r"from (1_\g<1>)", string)
string = re.sub(fr"on ([^{''.join(exclusion_list)}]*)", r"on (1_\g<1>)", string)
print(string)

Output:

with crossroad  
from (1_crossroad)  
(on (1_pike))

This assumes there is a \n character between lines, to capture the whole expression after the keyword. In this code, each re.sub call will replace every occurrences of one case, either from XXX or on XXX.
Additionally, pay attention this works in this case, but might break in other cases, for instance if you had [on pike], the resulting line would be [on (pike]). You might want to add some characters to the exclusion list.
The exclusion list is then added to the pattern by using a formatted (f) raw (r) string. This will capture everything on the line until one of the excluded characters are present.
This has one major consequence, the characters in the exclusion list need to be properly escaped to achieve your goal. For instance, if you wanted to capture only the first word after from and on, you would want to add the space as an excluded character. For this, the pattern itself would need \s to be added, thus we would need to add \\s in the list (we need to double escape in order for a single escape to be present in the pattern).
Finally, we are here using the same exclusion list for both cases, you obviously can use two different lists.

CodePudding user response:

You can use a positive look behind to match from and on without including them in the answer.

This Regex (?<=(from|on) )([^\s] ), will match any string after from and on such that you can replace it.

You can see it in action here

  • Related