Home > Enterprise >  How to find specific regex in Python
How to find specific regex in Python

Time:11-23

I want to make a data analyzing script and therefore I'm checking the cells of an excel sheet for occuring error codes. For each error code I iterate through my error code list and check for every single code if there is a regex match in that cell.

Some codes have 4 digits and some have 6.

The problem is now, for all the 6 digit codes that somewhere in itself have the same sequence as one of the 4 digits codes, there is a regex match for this 4 digit code and it will be counted even if this 4 digit code doesn't occure in this cell.

Here is a small code example which makes the problem quite clear I think.

errorcodes = [1234, 123456]
cell = "This is the cell containing the error 123456"
counter = 0

for i in range(2):
    if re.search(str(errorcodes[i]), cell):
        counter  = 1

if counter == 2:
    print("This is the wrong number of errors")
elif counter == 1:
    print("This is the right number of errors")

CodePudding user response:

The regex search method is being asked to look for 1234 in the string 123456, so it does find a match. But of course it also finds a match when you look for 123456. What you want is to find only the match on the whole of the error code.

You can do this by searching the string between word boundaries. A word boundary is signified by the regex metacharacter \b, which you can use like this:

re.search(rf"\b{errorcodes[i]}\b", cell)

As part of a revised version of your code:

import re

errorcodes = [1234, 123456]
cell = "This is the cell containing the error 123456"
counter = 0

for i in range(2):
    if re.search(rf"\b{errorcodes[i]}\b", cell):
        counter  = 1

if counter == 2:
    print("This is the wrong number of errors")
elif counter == 1:
    print("This is the right number of errors")

I decided to use Python 3.6's f-formatted strings to make it easier to specify the search regex.

CodePudding user response:

Thanks for your approach, I tried this but it doesn't work. I also tried

    re.search(r"\D{errorcodes[i]}", cell) 

but that doesn't work neither, I don't know why.

CodePudding user response:

Ok got my mistake, your solution works. Thank you very much.

  • Related