I want to make a data analyzing script and therefore I'm checking the cells of an excel sheet for occuring error codes. For each error code I iterate through my error code list and check for every single code if there is a regex match in that cell.
Some codes have 4 digits and some have 6.
The problem is now, for all the 6 digit codes that somewhere in itself have the same sequence as one of the 4 digits codes, there is a regex match for this 4 digit code and it will be counted even if this 4 digit code doesn't occure in this cell.
Here is a small code example which makes the problem quite clear I think.
errorcodes = [1234, 123456]
cell = "This is the cell containing the error 123456"
counter = 0
for i in range(2):
if re.search(str(errorcodes[i]), cell):
counter = 1
if counter == 2:
print("This is the wrong number of errors")
elif counter == 1:
print("This is the right number of errors")
CodePudding user response:
The regex search method is being asked to look for 1234
in the string 123456
, so it does find a match. But of course it also finds a match when you look for 123456
. What you want is to find only the match on the whole of the error code.
You can do this by searching the string between word boundaries. A word boundary is signified by the regex metacharacter \b
, which you can use like this:
re.search(rf"\b{errorcodes[i]}\b", cell)
As part of a revised version of your code:
import re
errorcodes = [1234, 123456]
cell = "This is the cell containing the error 123456"
counter = 0
for i in range(2):
if re.search(rf"\b{errorcodes[i]}\b", cell):
counter = 1
if counter == 2:
print("This is the wrong number of errors")
elif counter == 1:
print("This is the right number of errors")
I decided to use Python 3.6's f-formatted strings to make it easier to specify the search regex.
CodePudding user response:
Thanks for your approach, I tried this but it doesn't work. I also tried
re.search(r"\D{errorcodes[i]}", cell)
but that doesn't work neither, I don't know why.
CodePudding user response:
Ok got my mistake, your solution works. Thank you very much.