Home > Net >  regular expression replace of special characters
regular expression replace of special characters

Time:09-26

I have the code like this

 PATTERN ="(?<=[du][0-9]\w{3})\sextension\s|extension\s|\sextension|extension |\sext\s|\sext|ext\s|ext|\sext\s|ext(?<=[0-9][\w])";
Replace_Pat=PARA.withColumn("text", F.regexp_replace("text",PATTERN,'.'))

lets say the text= "This is value d8567ext67" is getting replaced to "This is value d8567.67" but I am unable to do only the

  1. space between the code "This is value d8567 67" to replace to "This is value d8567.67". But that will result in changing every space to "." like "This.is.value.d8567.67". I also want to achieve

  2. forward slash-"This is value d8567/67" to "This is value d8567.67".

I want to achieve only these and not all the special characters. In python I tried like this which also gave the same result for space within the code.

import re

#Replace the first two occurrences of a white-space character with the digit 9:

txt = "The rain in Spain d0045 56 "
x = re.sub("(?<=[du][0-9]\w{3})\sextension\s|extension\s|\sextension|extension|\sext\s|\sext|ext\s|ext|\sext\s|\s|-|\s-|\s-\s|ext(?<=[0-9][\w])", '.', txt, 10)
print(x)

CodePudding user response:

It's not really clear for me but you can try:

txt = "This is value d8567ext67"

out = re.sub(r'([du]\d\w{3})\s*((?:ext(?:ension)?|/))\s*(\d\w)', r'\1.\3', txt)

Output:

>>> out
'This is value d8567.67'

CodePudding user response:

A few notes about the pattern that you tried:

  • The pattern that you tried does not contain a / to match
  • The lookbehind in the last alternative ext(?<=[0-9][\w]) will always be false. It means match ext and assert directly to the left a digit and a word char, but ext does not contain a digit.
  • The alternatives are not grouped, so the lookarounds only apply to the text it precedes or follows and not to the whole pattern

What you can do is add / as an alternative, and turn the lookbehind at the end into a positive lookahead outside of the grouping so it applies to all the alternatives instead of only the last one.

(?<=[du]\d\w{3})(?:\s?ext(?:ension)?\s?|/)(?=\s*\d\w)

Regex demo

import re

txt = "This is value d8567ext67"
x = re.sub("(?<=[du]\d\w{3})(?:\s?ext(?:ension)?\s?|/)(?=\s*\d\w)", '.', txt)
print(x)

Output

This is value d8567.67
  • Related