I am using Python and I would like to split the following string:
string = '小西 - 杏花 Siu Sai - Heng Fa'
I would like to split the string that could give me 小西 - 杏花
and Siu Sai - Heng Fa
. I tried different ways and still couldn't split the string properly.
Thanks in advance
CodePudding user response:
One of the option is just to split before the first English character and take the 1st and 2nd group
inputstring = '小西 - 杏花 Siu Sai - Heng Fa'
a = re.split(r'([a-zA-Z].*)', inputstring)
>>>['小西 - 杏花 ', 'Siu Sai - Heng Fa', '']
Another way to do this without an empty string is to use the regex ?=[a-z]
as pointed out by @blhsing
a = re.split(r'(?:=[a-z])', inputstring)
CodePudding user response:
If the pattern you're looking for is "a series of non-alphabetical characters, followed by a space, followed by a series of alphabetical characters, spaces and dashes":
import re
text = '小西 - 杏花 Siu Sai - Heng Fa'
m = re.match(r'([^a-zA-Z] )\s([a-zA-Z\s-] )', text)
print(f'"{m.group(1)}"')
print(f'"{m.group(2)}"')
Output:
"小西 - 杏花"
"Siu Sai - Heng Fa"
So, m.group(1)
and m.group(2)
will be the parts of the string you're after.