I want to convert addresses to 'A 0 a" type, but below code doesn't work. It returns the same text instead of converting it.
import re
def findpattern(text: str) -> str:
text = re.sub(r"a-z",'a',text)
text = re.sub(r"A-Z", 'A', text)
text = re.sub(r"0-9", '0', text)
text = re.sub(r"a ", 'a', text)
text = re.sub(r"A ", 'A', text)
text = re.sub(r"0 ", '0', text)
return text
findpattern("20th Street 2020 NE VA")
Out[9]: '20th Street 2020 NE VA'
CodePudding user response:
You need to put groups in square brackets. Like this:
import re
def findpattern(text: str) -> str:
text = re.sub(r"[a-z]",'a',text)
text = re.sub(r"[A-Z]", 'A', text)
text = re.sub(r"[0-9]", '0', text)
text = re.sub(r"a ", 'a', text)
text = re.sub(r"A ", 'A', text)
text = re.sub(r"0 ", '0', text)
return text
findpattern("20th Street 2020 NE VA")
CodePudding user response:
As pointed out in the comments, you need to wrap your character classes in brackets. You can also combine the steps where you transform a class to a/A/0 and the next step where you combine repeats:
import re
def findpattern(text: str) -> str:
text = re.sub(r"[a-z] ", 'a', text)
text = re.sub(r"[A-Z] ", 'A', text)
text = re.sub(r"[0-9] ", '0', text)
return text
findpattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'
Now, while this work you approach is quite inefficient as you need to parse the string 6 (or 3) times.
You could instead determine the character classes during a single run.
Here is an example:
def char_class(c):
if c.isdigit():
return '0'
if c.isalpha():
return 'A' if c.isupper() else 'a'
return c
from itertools import groupby
def pattern(text):
return ''.join([k for k,g in groupby(text, char_class)])
pattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'