Home > Back-end >  Regular expression for substitution of similar pattern in a string in Python
Regular expression for substitution of similar pattern in a string in Python

Time:10-27

I want to use a regular expression to detect and substitute some phrases. These phrases follow the same pattern but deviate at some points. All the phrases are in the same string.

For instance I have this string:

/this/is//an example of what I want /to///do

I want to catch all the words inside and including the // and substitute them with "".

To solve this, I used the following code:

import re
txt = "/this/is//an example of what i want /to///do"
re.search("/.*/",txt1, re.VERBOSE)
pattern1 = r"/.*?/\w "
a = re.sub(pattern1,"",txt)

The result is:

' example of what i want '

which is what I want, that is, to substitute the phrases within // with "". But when I run the same pattern on the following sentence

"/this/is//an example of what i want to /do"

I get

' example of what i want to /do'

How can I use one regex and remove all the phrases and //, irrespective of the number of // in a phrase?

CodePudding user response:

You can use

/(?:[^/\s]*/)*\w 

See the regex demo. Details:

  • / - a slash
  • (?:[^/\s]*/)* - zero or more repetitions of any char other than a slash and whitespace
  • \w - one or more word chars.

See the Python demo:

import re
rx = re.compile(r"/(?:[^/\s]*/)*\w ")
texts = ["/this/is//an example of what I want /to///do", "/this/is//an example of what i want to /do"]
for text in texts:
    print( rx.sub('', text).strip() ) 
# => example of what I want
#    example of what i want to

CodePudding user response:

In your example code, you can omit this part re.search("/.*/",txt1, re.VERBOSE) as is executes the command, but you are not doing anything with the result.

You can match 1 or more / followed by word chars:

/ \w 

Or a bit broader match, matching one or more / followed by all chars other than / or a whitspace chars:

/ [^\s/] 
  • / Match 1 occurrences of /
  • [^\s/] Match 1 occurrences of any char except a whitespace char or /

Regex demo

import re

strings = [
    "/this/is//an example of what I want /to///do",
    "/this/is//an example of what i want to /do"
]

for txt in strings:    
    pattern1 = r"/ [^\s/] "
    a = re.sub(pattern1, "", txt)
    print(a)

Output

 example of what I want 
 example of what i want to 
  • Related