I have the code like this
PATTERN ="(?<=[du][0-9]\w{3})\sextension\s|extension\s|\sextension|extension |\sext\s|\sext|ext\s|ext|\sext\s|ext(?<=[0-9][\w])";
Replace_Pat=PARA.withColumn("text", F.regexp_replace("text",PATTERN,'.'))
lets say the text= "This is value d8567ext67" is getting replaced to "This is value d8567.67" but I am unable to do only the
space between the code "This is value d8567 67" to replace to "This is value d8567.67". But that will result in changing every space to "." like "This.is.value.d8567.67". I also want to achieve
forward slash-"This is value d8567/67" to "This is value d8567.67".
I want to achieve only these and not all the special characters. In python I tried like this which also gave the same result for space within the code.
import re
#Replace the first two occurrences of a white-space character with the digit 9:
txt = "The rain in Spain d0045 56 "
x = re.sub("(?<=[du][0-9]\w{3})\sextension\s|extension\s|\sextension|extension|\sext\s|\sext|ext\s|ext|\sext\s|\s|-|\s-|\s-\s|ext(?<=[0-9][\w])", '.', txt, 10)
print(x)
CodePudding user response:
It's not really clear for me but you can try:
txt = "This is value d8567ext67"
out = re.sub(r'([du]\d\w{3})\s*((?:ext(?:ension)?|/))\s*(\d\w)', r'\1.\3', txt)
Output:
>>> out
'This is value d8567.67'
CodePudding user response:
A few notes about the pattern that you tried:
- The pattern that you tried does not contain a
/
to match - The lookbehind in the last alternative
ext(?<=[0-9][\w])
will always be false. It means matchext
and assert directly to the left a digit and a word char, butext
does not contain a digit. - The alternatives are not grouped, so the lookarounds only apply to the text it precedes or follows and not to the whole pattern
What you can do is add /
as an alternative, and turn the lookbehind at the end into a positive lookahead outside of the grouping so it applies to all the alternatives instead of only the last one.
(?<=[du]\d\w{3})(?:\s?ext(?:ension)?\s?|/)(?=\s*\d\w)
import re
txt = "This is value d8567ext67"
x = re.sub("(?<=[du]\d\w{3})(?:\s?ext(?:ension)?\s?|/)(?=\s*\d\w)", '.', txt)
print(x)
Output
This is value d8567.67