Home > Enterprise >  Writing a Regex for finding a pattern from one arbitrary string in another
Writing a Regex for finding a pattern from one arbitrary string in another

Time:10-21

I'm trying to a regex to find similar "order-preserving" pattern/words found in "field" and use that to find matches in "text". I want to write such a regex that will find partial matches too like in Example 1 below.

Maybe, making words optional is an approach but then it starts to match arbitrary stuff.

I want you folks help in trying to write a regex that takes "field" and makes a regex out of it and then finds that pattern in "text". Partial matches are okay, too.

Both string inputs can be anything and regex should be generic enough to work on anything.

Please ask clarifying questions if needed! Any observations/directions where I'm going wrong that you can point out would be very helpful.

def regexp(field, text):

    import re
    
    key = re.split('\W', field)
    regex= "^.*"
    for x in key:
        if len(x)>0:
            #regex  = "(" x ")?"
            regex  = x
            regex  = ".*"

    regex = r'{}'.format(regex)
    pattern = re.compile(regex, re.IGNORECASE)
    matches = list(re.finditer(pattern, text))
    print(matches, "\n", pattern)

    if len(matches)>0:
        return True
    else:
        return False 

Examples:

print(regexp("F1 gearbox: 0-400 m","f1 gearbox")) # this should match
#this is a partial match, my regex should be able to find this match

print(regexp("0-100 kmph" , "100-0 kmph")) # this should not match
#order of characters/words in my regex/text should match

print(regexp("F1 gearbox: 0-400 m","none")) # this should not match
#if i try use "(word)?" in my regex then everything becomes optional #and it starts to match random words like "none","sbhsuckjcsak", etc. #this obviously is not expected. 

print(regexp("Combined* (ECE EUDC) (l/100 km)","combined ece eudc")) #this should match
#because its a partial match and special characters are not important #for my matching usecase

CodePudding user response:

Your function is already returning the right values for the examples you posted. You just have to fix the order of "text" and "field". I also made the code shorter and (in my opinion at least) easier to read:

def regexp(text, field):

    import re
    
    key = re.split('\W', field)
    regex = rf'^.*{".*".join(key)}'
    pattern = re.compile(regex, re.IGNORECASE)
    matches = re.findall(pattern, text)
    # print(matches, "\n", pattern)

    return len(matches)>0
  • Related