Home > Software design >  How to split chinese and english word once only?
How to split chinese and english word once only?

Time:08-30

I am using Python and I would like to split the following string:

string = '小西 - 杏花 Siu Sai - Heng Fa'

I would like to split the string that could give me 小西 - 杏花 and Siu Sai - Heng Fa. I tried different ways and still couldn't split the string properly.

Thanks in advance

CodePudding user response:

One of the option is just to split before the first English character and take the 1st and 2nd group

inputstring = '小西 - 杏花 Siu Sai - Heng Fa'
a = re.split(r'([a-zA-Z].*)', inputstring)
>>>['小西 - 杏花 ', 'Siu Sai - Heng Fa', '']

Another way to do this without an empty string is to use the regex ?=[a-z] as pointed out by @blhsing

a = re.split(r'(?:=[a-z])', inputstring)

CodePudding user response:

If the pattern you're looking for is "a series of non-alphabetical characters, followed by a space, followed by a series of alphabetical characters, spaces and dashes":

import re

text = '小西 - 杏花 Siu Sai - Heng Fa'

m = re.match(r'([^a-zA-Z] )\s([a-zA-Z\s-] )', text)
print(f'"{m.group(1)}"')
print(f'"{m.group(2)}"')

Output:

"小西 - 杏花"
"Siu Sai - Heng Fa"

So, m.group(1) and m.group(2) will be the parts of the string you're after.

  • Related