Home > OS >  What regular expression would allow me to remove numbers up until english characters start? (python)
What regular expression would allow me to remove numbers up until english characters start? (python)

Time:03-14

I need to turn strings into a certain format:

#1 - New York to New York

#4 - London to London

etc.

I did originally just remove special characters, however this included spaces and therefore I had errors such as NewYork.

My orignal way:

''.join(filter(str.isalpha, myString))

So I basically need to remove the #, number, spaces (before the city name starts) and the -

CodePudding user response:

I suggest splitting the string into two chunks with ' - ' substring, and grab the last chunk:

result = myString.split(' - ', 1)[-1]

See a Python demo:

texts = ['#1 - New York', '#4 - London']
for myString in texts:
    print( myString, '=>', myString.split(' - ', 1)[-1] )

Output:

#1 - New York => New York
#4 - London => London

Regarding the regex solution, you might want to remove any non-letters at the start of the string with re.sub(r'^[\W\d_] ', '', myString) or re.sub(r'^[^a-zA-Z] ', '', myString). Note [\W\d_] is a fully Unicode aware pattern while ^[^a-zA-Z] is ASCII only.

CodePudding user response:

If you still want to use regex despite having the simpler approach with split:

#\d \s*-\s*(.*)
  • # matches the # symbol
  • \d matches at least one digit after
  • \s* matches any whitespace that could follow
  • - matches the hyphen
  • \s* matches any whitespace that could follow
  • (.*) creates a capture group for all the other characters up until a line break

Then to match:

import re
match = re.search(r"#\d \s*-\s*(.*)", "#1 - New York")
if match:
    result = match.group(1)
    print(result)
  • Related