I have a string with the names of a cities and the numbers of people living in them. I need to match only names of cities using Regex
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
tried this
[a-zA-Z] (?:[\s-][a-zA-Z] )*$
but it returns "None"
CodePudding user response:
If you want all cities as a single string you can use [a-zA-Z]
to disregard all numbers and return a single string:
cities = " ".join(re.findall("[a-zA-Z] ", city))
Returning:
'New York Los Angeles Berlin'
Otherwise if you want them separated, I would split by -
first and then return using the same method as above in a list-comprehension way:
cities = [" ".join(re.findall("[a-zA-Z] ",x)) for x in city.split('-')[:-1]
Returning:
['New York','Los Angeles','Berlin']
CodePudding user response:
Try this:
[a-zA-Z] ?[a-zA-Z] (?= *-)
See regex demo.
CodePudding user response:
Try:
([^-] ?)\s*-\s*([\d\s] )
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-] ?)\s*-\s*([\d\s] )")
for c, n in pat.findall(city):
print(c, int(n.replace(" ", "")))
Prints:
New York 8468000
Los Angeles 3849000
Berlin 3645000
EDIT: If you don't need numbers:
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-] ?)\s*-\s*[\d\s] ")
for c in pat.findall(city):
print(c)
Prints:
New York
Los Angeles
Berlin