I want to find strings that have no 。
char in them, with an optional occurrence of this character at the end of the string.
I search some tips, like that, but didn't solve my problem.
^(?!\.)(?!.*\.$)(?!.*\.\.)[a-zA-Z0-9_.] $
(?!\.) - don't allow . at start
(?!.*\.\.) - don't allow 2 consecutive dots
(?!.*\.$) - don't allow . at end
I tried to use
str_l = ["aaa。bbb。","aaa。","aaa"]
for str1 in str_l:
res1 = re.search(r'(.*?!。*$)', str1) #if 。not in string, return True
res2 = re.search(r'(?<!(。)。$)',str1) # if 。 only appear at the end of string, return True, but not solved
print(res1,res2)
I want to combine res1
and res2
to one regex, and the string results like False, True, True
.
CodePudding user response:
You can use
import re
str_l = ["aaa。bbb。","aaa。","aaa"]
for str1 in str_l:
print(str1, '=>', bool(re.search(r'^[^。]*。?$', str1)))
Output:
# => aaa。bbb。 => False
aaa。 => True
aaa => True
See the Python demo. Details:
^
- start of string[^。]*
- zero or more chars other than the dot。?
- an optional dot$
- at the end of string.
To obtain the valid strings from the list using this regex, you can use
rx = re.compile(r'^[^。]*。?$')
print( list(filter(rx.search, str_l)) )
# => ['aaa。', 'aaa']
CodePudding user response:
This can be done with the following code.
import re
p = re.compile("^(?:(?!。).)*(。$)?(?!.*。).*$")
l = [
"aaa。bbb。",
"aaa bbb。", # matches because only at end
"aaa。bbb",
"。aaa bbb",
"aaa bbb", # matches because none found
]
print([s for s in l if p.match(s)])
Which results in:
['aaa bbb。', 'aaa bbb']
The full explanation can be found here at regex101.com.
The only advantage to this matching expression over the much more terse ^[^。]*。?$
is that it can be used with strings in addition to a given character. So, say you need to match strings that may end with "foo" but it shall not appear earlier in the string. Then you could use ^(?:(?!foo).)*(foo$)?(?!.*foo).*$
.
However, it is about 60% slower. You can see the test and results here:
import re
import timeit
a = re.compile("^(?:(?!。).)*(。$)?(?!.*。).*$")
b = re.compile("^[^。]*。?$")
l = [
"aaa。bbb。",
"aaa bbb。", # matches because only at end
"aaa。bbb",
"。aaa bbb",
"aaa bbb", # matches because none found
]
print(
timeit.timeit(
"matches = [s for s in l if a.match(s)]",
setup="from __main__ import (l, a)",
)
)
print(
timeit.timeit(
"matches = [s for s in l if b.match(s)]",
setup="from __main__ import (l, b)",
)
)
Which gives:
2.6208932230000004
1.6510743480000003
CodePudding user response:
Another approach can be splitting on 。
If you use split and the is 。
at the end of the string, the last item in the list will be empty.
If it does not occur, the list size is 1.
str_l = ["aaa。bbb。", "aaa。", "aaa", "。", "。 ", "。。"]
for str1 in str_l:
lst = str1.split(r"。")
nr = len(lst)
print(f"'{str1}' -> {nr == 1 or nr == 2 and lst[1] == ''}")
Output
'aaa。bbb。' -> False
'aaa。' -> True
'aaa' -> True
'。' -> True
'。 ' -> False
'。。' -> False
See a Python demo.