I have a Python list of several cities and I’m trying to match only those that have certain accented characters. I managed to get a lot of them, but not all (it fails at 'El Fuerte de la Unión' for example.
I also feel that my syntax could be more efficient, I’m adding word and whitespace characters, but there must be a better way. I’m not sure how to construct a search that would take into account that there may or may not be multiple words or spaces before a word with required characters is matched.
This is my syntax:
'(\w*\s*\w*\s*\w*[áíúóéó] .*?\s*\w*)'
This is a portion of the list:
'Name', 'San Marcos Tecomaxusco', 'Teapa', 'Tatatila', 'Sisal', 'Simojovel de Allende', 'El Fuerte de la Unión', 'Santiago Zoochila', 'Santiago Nuyoó', 'Miahuatlán', 'Santa María Huazolotitlán', 'Santa María Chimalhuacán', 'Santa María Apazco', 'Santa Cruz Ozolotepec', 'San Simón', 'Huixcolotla', 'San Rafael Ixtapalucan', 'San Pedro Mixtepec', 'San Pedro Huilotepec', 'San Miguel Balderas', 'San Mateo Almomoloha', 'San Martín Chalchicuautla', 'Teolocholco', 'San Luis Ayucán', 'San Juan Zitlaltepec', 'San José de las Flores', 'San Jerónimo Xayacatlán', 'San Hipólito', 'San Francisco Oxtotilpan', 'San Cristóbal de las Casas'
Here is a link to regex101 with the full list: https://regex101.com/r/6WlY8o/1
CodePudding user response:
You can try using the following regex:
'[^']*[áíúóéó][^']*'
Regex Explanation:
'
: single quote[^']*
: any non-single quote[áíúóéó]
: one accented character[^']*
: any non-single quote'
: single quote
If you don't want to match single quotes, you can add lookarounds:
(?<=')[^']*[áíúóéó][^']*(?=')
Check the demo here.