I need to turn strings into a certain format:
#1 - New York
to New York
#4 - London
to London
etc.
I did originally just remove special characters, however this included spaces and therefore I had errors such as NewYork
.
My orignal way:
''.join(filter(str.isalpha, myString))
So I basically need to remove the #, number, spaces (before the city name starts) and the -
CodePudding user response:
I suggest splitting the string into two chunks with ' - '
substring, and grab the last chunk:
result = myString.split(' - ', 1)[-1]
See a Python demo:
texts = ['#1 - New York', '#4 - London']
for myString in texts:
print( myString, '=>', myString.split(' - ', 1)[-1] )
Output:
#1 - New York => New York
#4 - London => London
Regarding the regex solution, you might want to remove any non-letters at the start of the string with re.sub(r'^[\W\d_] ', '', myString)
or re.sub(r'^[^a-zA-Z] ', '', myString)
. Note [\W\d_]
is a fully Unicode aware pattern while ^[^a-zA-Z]
is ASCII only.
CodePudding user response:
If you still want to use regex despite having the simpler approach with split:
#\d \s*-\s*(.*)
#
matches the # symbol\d
matches at least one digit after\s*
matches any whitespace that could follow-
matches the hyphen\s*
matches any whitespace that could follow(.*)
creates a capture group for all the other characters up until a line break
Then to match:
import re
match = re.search(r"#\d \s*-\s*(.*)", "#1 - New York")
if match:
result = match.group(1)
print(result)