Home > Back-end >  Regex to match city names from text with numbers
Regex to match city names from text with numbers

Time:11-14

I have a string with the names of a cities and the numbers of people living in them. I need to match only names of cities using Regex

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

tried this

[a-zA-Z] (?:[\s-][a-zA-Z] )*$

but it returns "None"

CodePudding user response:

If you want all cities as a single string you can use [a-zA-Z] to disregard all numbers and return a single string:

cities = " ".join(re.findall("[a-zA-Z] ", city))

Returning:

'New York Los Angeles Berlin'

Otherwise if you want them separated, I would split by - first and then return using the same method as above in a list-comprehension way:

cities = [" ".join(re.findall("[a-zA-Z] ",x)) for x in city.split('-')[:-1]

Returning:

['New York','Los Angeles','Berlin']

CodePudding user response:

Try this:

[a-zA-Z]  ?[a-zA-Z] (?= *-)

See regex demo.

CodePudding user response:

Try:

([^-] ?)\s*-\s*([\d\s] )

Regex demo.


import re

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

pat = re.compile(r"([^-] ?)\s*-\s*([\d\s] )")

for c, n in pat.findall(city):
    print(c, int(n.replace(" ", "")))

Prints:

New York 8468000
Los Angeles 3849000
Berlin 3645000

EDIT: If you don't need numbers:

import re

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

pat = re.compile(r"([^-] ?)\s*-\s*[\d\s] ")

for c in pat.findall(city):
    print(c)

Prints:

New York
Los Angeles
Berlin
  • Related