In python 3, I have a string :
CONN <DO_NOT_PRINT>user</DO_NOT_PRINT>/<DO_NOT_PRINT>password</DO_NOT_PRINT>@//host:port/service
I want to replace each letter in the words beetween <DO_NOT_PRINT> and </DO_NOT_PRINT> tags by asterisks (and remove the tags), ie :
CONN ****/********@//host:port/service
strings user and especially password can be any chars
What I have so far is :
z="CONN <DO_NOT_PRINT>user</DO_NOT_PRINT>/<DO_NOT_PRINT>password</DO_NOT_PRINT>@//host:port/service"
REPLACEME = re.compile('<DO_NOT_PRINT>(. )<\/DO_NOT_PRINT>')
found = REPLACEME.search(z)
print(found)
if found:
old_text = found.group(1)
new_z = z.replace(old_text, '*' * len(old_text))
print(new_z)
else:
print(z)
but it doesn't work correctly as it prints :
CONN <DO_NOT_PRINT>******************************************</DO_NOT_PRINT>@//host:port/service
instead of :
CONN ****/********@//host:port/service
CodePudding user response:
Regex tries to match with the longest value possible, so the (. )
captures:
user</DO_NOT_PRINT>/<DO_NOT_PRINT>password
You should specify ungreedy operator after plus:
REPLACEME = re.compile('<DO_NOT_PRINT>(. ?)<\/DO_NOT_PRINT>')
Your group(1)
does not encompass <DO_NOT_PRINT>
. If you want this to also disappear, use group(0)
to get the entire matched string. Try:
z.replace(found.group(0), '*' * len(old_text))
Edit:
If you want to replace multiple occurance, you can use re.finditer()
and do one .replace()
for each match: https://docs.python.org/3/library/re.html#re.finditer
import re
z="CONN <DO_NOT_PRINT>user</DO_NOT_PRINT>/<DO_NOT_PRINT>password</DO_NOT_PRINT>@//host:port/service"
REPLACEME = re.compile('<DO_NOT_PRINT>(. ?)<\/DO_NOT_PRINT>')
founds = REPLACEME.finditer(z)
print(founds)
for found in founds:
old_text = found.group(1)
z = z.replace(found.group(0), '*' * len(old_text))
print(z)
Or, use Viktor's answer which looks more elegant.
CodePudding user response:
You can use
import re
z="CONN <DO_NOT_PRINT>user</DO_NOT_PRINT>/<DO_NOT_PRINT>password</DO_NOT_PRINT>@//host:port/service"
REPLACEME = re.compile('<DO_NOT_PRINT>(.*?)</DO_NOT_PRINT>', re.DOTALL)
print( REPLACEME.sub(lambda x: '*' * len(x.group(1)), z) )
# => CONN ****/********@//host:port/service
See the Python demo.
NOTES:
re.compile(r'<DO_NOT_PRINT>(.*?)</DO_NOT_PRINT>', re.DOTALL)
-*?
lazy quantifier is used to make sure the matching stops at the leftmost occurrence of the right-hand delimiter andre.DOTALL
makes sure.
matches line break chars, toolambda x: '*' * len(x.group(1))
is now there.sub
replacement argument, wherex
is theMatchData
object,x.group(1)
is the Group 1 captured value, the text between two strings.
If you are concerned with performance, unroll the lazy dot pattern:
REPLACEME = re.compile(r'<DO_NOT_PRINT>([^<]*(?:<(?!/DO_NOT_PRINT>)[^<]*)*)</DO_NOT_PRINT>')
Do not use re.DOTALL
here.