How to ensure that at least one of A and Bcsign appears in the regex?-CodePudding

s_l = ["春天年初, ...","1999年", "很多年以前"]
for front_part in s_l:
    idx_year = re.search(r'[\d]*[年]{1}[初末底前]{0,1}',front_part).end() \
    if re.search(r'[\d]*[年]{1}[初末底前]{0,1}',front_part) else 0
    print(idx_year)

I want to search the idx of sub-string that has 年， and at the same time, there must be digits(sign A) before 年 or [初末底前](sign B) behind "年", for example,in s_l, it should return 4,5,0

One idea to divide the regex, like

re.search(r'[\d] [年]{1}',front_part) or re.search(r'[年]{1}[初末底前]{0,1}',front_part)

but it is too complex, other one is using (?=...) but I haven't got the idea and how to use it,any suggestions?

CodePudding user response：

Try this,

lst = ["春天年初, ...","1999年", "很多年以前"]
pattern = re.compile("(.*\d|初末底前)年.*")
for i, s in enumerate(lst):
  patterns = list(pattern.finditer(s))
  if patterns != []:
    print(list(map(lambda item:(i, item.span()), patterns)))

The output would be something like [(1, (0, 5))]. Here the first part of every element of this list would be the index of the string from the list lst which matches the conditions. The second part would be the index of the occurence of those conditions in the list item.

CodePudding user response：

You can use a lookbehind assertion to match an occurrence of 年 that's preceded by a digit. Use an alternation pattern to match one that's followed by [初末底前]:

pattern = re.compile(r'(?<=\d)年|年[初末底前]')
print([match.end() if match else 0 for match in map(pattern.search, s_l)])

This outputs:

[4, 5, 0]