Hi I'm looking for a regular expression that would allow me not only to replace characters but also to annotate the occurrence number.
For example I would like to replace all special characters with "s", all letters with "c" and all number with "d" and annotate their occurrence between "{}".
If I have "123-45AB-78!£", I would like to get d{3}s{1}d{3}c{2}s{1}d{2}s{2}.
Is there a way to do that with regex?
Many thanks
CodePudding user response:
Here is one approach using re.sub
with a callback function:
import re
def repl(m):
c = m.group()
if re.search(r'^[A-Za-z] $', c):
return 'c{' str(len(c.decode('utf8'))) '}'
elif re.search(r'^\d $', c):
return 'd{' str(len(c.decode('utf8'))) '}'
else:
return 's{' str(len(c.decode('utf8'))) '}'
x = "123-45AB-78!£"
print(re.sub('[A-Za-z] |\d |\D ', repl, x))
# d{3}s{1}d{2}c{2}s{1}d{2}s{2}
Note that since your input string contains non ASCII characters, we cannot simply use len()
to find the numbes of characters in the string. Assuming a UTF-8 character set and a string str
, we can use the following formula:
len(str.decode('utf8'))
CodePudding user response:
Here is a method that first replaces each character by its type-character, then counts them with itertools.groupby. I'm not sure it is any faster than the good answer given by Tim, but it should be comparable.
x = "123-45AB-78!£"
for pat, sub in [(r"[A-Za-z]", "c"), (r"\d", "d"), (r"[^\d\w]", "s")]:
x = re.sub(pat, sub, x)
print(x) # dddsddccsddss
y = "".join([f"{k}{{{len(list(g))}}}" for k, g in groupby(x)])
print(y) # d{3}s{1}d{2}c{2}s{1}d{2}s{2}