Regex returns same text in Python-CodePudding

I want to convert addresses to 'A 0 a" type, but below code doesn't work. It returns the same text instead of converting it.

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"a-z",'a',text)
    text = re.sub(r"A-Z", 'A', text)
    text = re.sub(r"0-9", '0', text)
    text = re.sub(r"a ", 'a', text)
    text = re.sub(r"A ", 'A', text)
    text = re.sub(r"0 ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")


Out[9]: '20th Street 2020 NE VA'

CodePudding user response：

You need to put groups in square brackets. Like this:

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"[a-z]",'a',text)
    text = re.sub(r"[A-Z]", 'A', text)
    text = re.sub(r"[0-9]", '0', text)
    text = re.sub(r"a ", 'a', text)
    text = re.sub(r"A ", 'A', text)
    text = re.sub(r"0 ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")

CodePudding user response：

As pointed out in the comments, you need to wrap your character classes in brackets. You can also combine the steps where you transform a class to a/A/0 and the next step where you combine repeats:

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"[a-z] ", 'a', text)
    text = re.sub(r"[A-Z] ", 'A', text)
    text = re.sub(r"[0-9] ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'

Now, while this work you approach is quite inefficient as you need to parse the string 6 (or 3) times.

You could instead determine the character classes during a single run.

Here is an example:

def char_class(c):
    if c.isdigit():
        return '0'
    if c.isalpha():
        return 'A' if c.isupper() else 'a'
    return c

from itertools import groupby

def pattern(text):
    return ''.join([k for k,g in groupby(text, char_class)])

pattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'