I have a string like,
str1 = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
And I want to split this string by two conditions: 日
or 月
appear at the begining of string, at the same time, period 。
at the end of string. Thus, the result should like,
# ZZZ。 / 10月,AAA。/ 11月2日,BBB。CCC。/3日,DDD。EEE。/12月,FFF
And now, my idea is split them by period at first, then combine each of them according to the second rules(日
or 月
), the code can be run like,
import re
str1 = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
for i, item in enumerate(re.split(r'(?<=。)',str1)):
if i == 0:
cache = item
else:
if re.match(r'(^.{0,2}日)|(^.{0,2}月)', item):
res.append(cache)
cache = item
else:
cache = item
res.append(cache)
print(res)
But I was wondering is there anything in this format:
re.match(r'(^.{0,2}日)|(^.{0,2}月)', item) and re.match(r'。$', item)
can directly in one loop or some simple regex?
CodePudding user response:
You can use re.split
with
(?<=。)(?=\s*\d{1,2}[日月])
See the regex demo. Details:
(?<=。)
- match a location right after a dot(?=\s*\d{1,2}[日月])
- that is immediately followed with zero or more whitespaces, then one or two digits and then a日
or月
.
See the Python demo:
import re
text = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
print( re.split(r'(?<=。)(?=\s*\d{1,2}[日月])', text) )
# => ['ZZZ。', '10月,AAA。', '11月2日,BBB。CCC。', '3日,DDD。EEE。', '12月,FFF']