Home > Back-end >  Regex: possibly two patterns found in one text
Regex: possibly two patterns found in one text

Time:10-13

I have a specific pattern but the text to be process can change randomly.
The text I am trying to filter currently using regex (Python.re.findall, python v3.9.13) is as follow:
"ABC9,10.11A5:6,7:8.10BC1"

I am using the following regex expression: r"([ABC]{1,})(([0-9]{1,}[,.:]{0,}){1,})"

The current result is:
[("ABC", "9,10.11", "11"), ("A", "5:6,7:8.10", "10"), ("BC", "1", "1")]

What I am looking for as result should be:
[("ABC", "9,10.11"), ("A", "5:6,7:8.10"), ("BC", "1")]

I don't understand why the last number in the second part is always repeated again.
Please help.

CodePudding user response:

I presume you are using re.findall, since that returns the contents of all capture groups in its output. In your case the last number repetition is due to the capture group around [0-9]{1,}[,.:]{0,}. Making that a non-capturing group resolves the issue:

([ABC]{1,})((?:[0-9]{1,}[,.:]{0,}){1,})

In python:

re.findall(r"([ABC]{1,})((?:[0-9]{1,}[,.:]{0,}){1,})", s)
# [('ABC', '9,10.11'), ('A', '5:6,7:8.10'), ('BC', '1')]
  • Related