Home > OS >  Substitute white spaces inside a Spanish date in a string with regex
Substitute white spaces inside a Spanish date in a string with regex

Time:04-09

When processing strings, I have found addresses with text inside in Spanish like '28 de julio'. I want a regex to detect that combination of 'day' 'de' 'month' and then suppress the white spaces in between. The months in Spanish can start with 'ene', 'feb', 'mar', 'abr', 'may', 'jun', 'jul', 'ago', 'set', 'sep', 'oct', 'nov' or 'dic'.

So if I have this address: 'avenida 10 de octubre 2546, managua' I want to convert it to: 'avenida 10deoctubre 2546, managua'

This is what I have tried:

import re
my_address = 'calle 4 de julio, heredia'

def compress_date_street_name(address: str) ->str:
    if address is not None:
        result = re.sub(r'(\d{1,2}\sde\[ene*|feb*|mar*|abr*|may*|jun*|jul*|ago*|set*|sep*|oct*|nov*|dic*])','\d{1,2}de', address)
    else:
        result = None
    return result

For this string the expected result is: 'calle 4dejulio, heredia'.

But it returns me an error that there is a bad scape. I'm not sure if my regex will do the detection I need. Any help will be greatly appreciated.

CodePudding user response:

You need to use some capturing groups (parentheses), change your capture regex slightly, and change your substitution regex to use the capture groups.

>>> address = 'calle 4 de julio, heredia'
>>> re.sub(r'(\d{1,2})\sde\s([ene|feb|mar|abr|may|jun|jul|ago|set|sep|oct|nov|dic].*)', r'\1de\2', address)
'calle 4dejulio, heredia'

Your substitution regex also needs to be a raw string, just like the capture one.

  • Related