Home > OS >  Replace matched susbtring using re sub
Replace matched susbtring using re sub

Time:02-16

Is there a way to replace the matched pattern substring using a single re.sub() line?. What I would like to avoid is using a string replace method to the current re.sub() output.

Input =  "/J&L/LK/Tac1_1/shareloc.pdf"

Current output using re.sub("[^0-9_]", "", input): "1_1"

Desired output in a single re.sub use: "1.1"

CodePudding user response:

According to the documentation, re.sub is defined as

re.sub(pattern, repl, string, count=0, flags=0)

If repl is a function, it is called for every non-overlapping occurrence of pattern.

This said, if you pass a lambda function, you can remain the code in one line. Furthermore, remember that the matched characters can be accessed easier to an individual group by: x[0].

I removed _ from the regex to reach the desired output.

txt = "/J&L/LK/Tac1_1/shareloc.pdf"
x = re.sub("[^0-9]", lambda x: '.' if x[0] is '_' else '', txt)
print(x)

CodePudding user response:

There is no way to use a string replacement pattern in Python re.sub to replace with two possible strings, as there is no conditional replacement construct support in Python re.sub. So, using a callable as the replacement argument or use other work-arounds.

It looks like you only expect one match of <DIGITS>_<DIGITS> in the input string. In this case, you can use

import re
text = "/J&L/LK/Tac1_1/shareloc.pdf"
print( re.sub(r'^.*?(\d )_(\d ).*', r'\1.\2', text, flags=re.S) )
# => 1.1

See the Python demo. See the regex demo. Details:

  • ^ - start of string
  • .*? - zero or more chars as few as possible
  • (\d ) - Group 1: one or more digits
  • _ - a _ char
  • (\d ) - Group 2: one or more digits
  • .* - zero or more chars as many as possible.
  • Related