Home > Back-end >  Regex returns same text in Python
Regex returns same text in Python

Time:04-01

I want to convert addresses to 'A 0 a" type, but below code doesn't work. It returns the same text instead of converting it.

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"a-z",'a',text)
    text = re.sub(r"A-Z", 'A', text)
    text = re.sub(r"0-9", '0', text)
    text = re.sub(r"a ", 'a', text)
    text = re.sub(r"A ", 'A', text)
    text = re.sub(r"0 ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")


Out[9]: '20th Street 2020 NE VA'

CodePudding user response:

You need to put groups in square brackets. Like this:

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"[a-z]",'a',text)
    text = re.sub(r"[A-Z]", 'A', text)
    text = re.sub(r"[0-9]", '0', text)
    text = re.sub(r"a ", 'a', text)
    text = re.sub(r"A ", 'A', text)
    text = re.sub(r"0 ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")

CodePudding user response:

As pointed out in the comments, you need to wrap your character classes in brackets. You can also combine the steps where you transform a class to a/A/0 and the next step where you combine repeats:

import  re
def findpattern(text: str) -> str:
    text = re.sub(r"[a-z] ", 'a', text)
    text = re.sub(r"[A-Z] ", 'A', text)
    text = re.sub(r"[0-9] ", '0', text)
    return text

findpattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'

Now, while this work you approach is quite inefficient as you need to parse the string 6 (or 3) times.

You could instead determine the character classes during a single run.

Here is an example:

def char_class(c):
    if c.isdigit():
        return '0'
    if c.isalpha():
        return 'A' if c.isupper() else 'a'
    return c

from itertools import groupby

def pattern(text):
    return ''.join([k for k,g in groupby(text, char_class)])

pattern("20th Street 2020 NE VA")
# '0a Aa 0 A A'
  • Related