I have a specific pattern but the text to be process can change randomly.
The text I am trying to filter currently using regex (Python.re.findall, python v3.9.13) is as follow:
"ABC9,10.11A5:6,7:8.10BC1"
I am using the following regex expression: r"([ABC]{1,})(([0-9]{1,}[,.:]{0,}){1,})"
The current result is:
[("ABC", "9,10.11", "11"), ("A", "5:6,7:8.10", "10"), ("BC", "1", "1")]
What I am looking for as result should be:
[("ABC", "9,10.11"), ("A", "5:6,7:8.10"), ("BC", "1")]
I don't understand why the last number in the second part is always repeated again.
Please help.
CodePudding user response:
I presume you are using re.findall
, since that returns the contents of all capture groups in its output. In your case the last number repetition is due to the capture group around [0-9]{1,}[,.:]{0,}
. Making that a non-capturing group resolves the issue:
([ABC]{1,})((?:[0-9]{1,}[,.:]{0,}){1,})
In python:
re.findall(r"([ABC]{1,})((?:[0-9]{1,}[,.:]{0,}){1,})", s)
# [('ABC', '9,10.11'), ('A', '5:6,7:8.10'), ('BC', '1')]