For example:
import re
s1 = 'LOGO 设计'
## s2 = '设计 LOGO'
s = re.sub('[a-zA-Z0-9]{3,}(\s)[^a-zA-Z0-9]', '', s1)
print(s)
I want to find at least 3 ascii chars, followed by a space, then followed by a nonascii char, and replace the white space with empty string. My code has two issues:
How to write the replacement string for (\s)?
How to make it also work for the reverse order of s2?:
[^a-zA-Z0-9]
CodePudding user response:
Put the strings that you want to keep in the result in capture groups, then reference them in the replacement.
s = re.sub(r'([a-zA-Z0-9]{3})\s([^a-zA-Z0-9])', r'\1\2', s1)
You don't need to use {3,}
, you can just use {3}
. This will copy the last 3 characters to the result. All the preceding characters will be copied by default because they're not being replaced.
You can also do it with lookarounds, by matching a space that's preceded by 3 ASCII characters and followed by a non-ASCII. Then you replace the space with an empty string.
s = re.sub(r'(?<=[a-zA-Z0-9]{3})\s(?=[^a-zA-Z0-9])', '', s1)
You can use alternative in this method to match both orders
s = re.sub(r'(?<=[a-zA-Z0-9]{3})\s(?=[^a-zA-Z0-9])|(?<=[^a-zA-Z0-9])\s(?=[a-zA-Z0-9]{3})', '', s1)
CodePudding user response:
With lookahead and lookbehind
s1 = 'LOGO 设计 SKY आकाश'
st = re.split(r'(?<=[^a-zA-Z])(?=[a-zA-Z])',s1)
[re.sub(r'\s ','',e) for e in st]
['LOGO设计', 'SKYआकाश']