Recursivey searching (segments of) a sentence in a lexicon (python)-CodePudding

This program should return True if it is possible to generate the given sequence x from a given list of segments seglist. After finding one possible solution it should stop.

I tried replacing and repositioning the return commands, but there's always a different problem coming up.

def valid_sequence(x, seglist): 
    if x in seglist:
        return True

    for i in seglist:
        if x.startswith(i):
            return valid_sequence(x[len(i):], seglist)
    return False

This returns the correct Boolean for sequence 'abc'and seglist ['a', 'ab', 'bc', 'c'], ['a', 'b', 'c'], and ['ab', 'bc'], but not for seglist ['a', 'ab', 'c'] because obviously it's going through the 'a' variant but won't be successfull and stops before going through 'ab', which it should.

I ran it through pythontutor and I understood some of the problems, but I could not determine how to fix them.

How do I write it so that it continues with segment 'ab' after taking the unsuccessful path of 'a'? Maybe I'm missing something else too, given that I can't wrap my head around this recursion.

Is it even remotely possible this way, or do I have to take a totally different approach?

CodePudding user response：

You were very close on this one. The main problem is that you should not return immediately in the recursive step. Notice that it will return False after the first loop that it is called. Consequently, the whole function will return False as well.

Said that, you should return only if the result is True:

def valid_sequence(x, seglist): 
    if x in seglist:
        return True

    for i in seglist:
        if x.startswith(i):
            if valid_sequence(x[len(i):], seglist):
                return True
    return False

This will fix the problem you mentioned, but there's one more to go. If x = 'abc' and seglist = ['ab', 'b', 'c'] the function will correctly return True. But it will also return True for x = 'abbc', x = 'abbbc' and so on. This happens because the 'b' is being used indefinitly. I suppose that's a not desired behavior. To overcome this we create a copy of the list for new calls and eliminate the used element:

import copy

def valid_sequence(x, seglist): 
    if x in seglist:
        return True

    for i in seglist:
        if x.startswith(i):
            newlist = copy.deepcopy(seglist)
            newlist.remove(i)
            
            if valid_sequence(x[len(i):], newlist):
                return True
    return False

CodePudding user response：

The problem with your code, as you've correctly identified, is that your program returns False prematurely, before it has exhausted all possible choices of strings in each position.

In particular, it returns False when running the following line:

return valid_sequence(x[len(i):], seglist)

If you think about it, the program should never be able to return False in that location since it potentially hasn't finished iterating over all the strings in seglist. However, you do want to return True if it has found a choice of string that completes the sequence.

Fortunately, only a slight modification is needed to fix this issue: check that the value returned is True before returning it. I've included the modified code below.

def valid_sequence(x, seglist): 
    if x in seglist:
        return True

    for i in seglist:
        if x.startswith(i):
            if valid_sequence(x[len(i):], seglist):
                return True
    return False