How can I generate a list with regex in python for countries with compounded names?
names = ['Nizhniy Novgorod', 'Cần Thơ', 'Ba Beja', 'Bandar Bampung', 'Benin City', 'Ciudad Nezahualcóyotl', 'Biên Hòa', 'São Gonçalo', 'São Luís', 'New Orleans', 'Thủ Đức']
I was trying to do this but it returns all names:
import re
lst = []
for word in names:
if re.findall(r'[A-Z]\w \b', word[0]) == re.findall(r'\b[A-Z]\w ', word[1]):
lst.append(word)
print(lst)
Output:
['Nizhniy Novgorod', 'Cần Thơ', 'Ba Beja', 'Bandar Bampung', 'Benin City', 'Ciudad Nezahualcóyotl', 'Biên Hòa', 'São Gonçalo', 'São Luís', 'New Orleans', 'Thủ Đức']
The desired output would be [Ba Beja, Bandar Bampung].
It is an exercise that's why I can only do it with the module re. Any help will be appreciate.
CodePudding user response:
Ok - so I have two answers for you.
One that uses REGEX, and the other that doesn't.
Here is the REGEX version:
import re
names = ['Nizhniy Novgorod', 'Cần Thơ', 'Ba Beja', 'Bandar Bampung', 'Benin City', 'Ciudad Nezahualcóyotl', 'Biên Hòa', 'São Gonçalo', 'São Luís', 'New Orleans', 'Thủ Đức']
pattern = re.compile(r'^([A-zÀ-ứ])[A-zÀ-ứ]*\s\1[A-zÀ-ứ]*$')
lst = []
for line in names:
if re.search(pattern, line):
lst.append(line)
print(lst)
OUTPUT:
['Nizhniy Novgorod', 'Ba Beja', 'Bandar Bampung']
And here is the other answer that does not use Regex:
names = ['Nizhniy Novgorod', 'Cần Thơ', 'Ba Beja', 'Bandar Bampung', 'Benin City', 'Ciudad Nezahualcóyotl', 'Biên Hòa', 'São Gonçalo', 'São Luís', 'New Orleans', 'Thủ Đức']
lst = []
space = ' '
for line in names:
if space in line:
first, second = line.split(space)
if first[0] == second[0]:
lst.append(line)
print(lst)
OUTPUT:
['Nizhniy Novgorod', 'Ba Beja', 'Bandar Bampung']