The problem

e.g.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

Possibilities:

1.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

those are indexes: [1, 2, 4, 5

2.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

those are indexes: [1, 2, 4, 9]

3.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

those are indexes: [1, 2, 7, 9]

4.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

those are indexes: [1, 6, 7, 9]

5.

list1 = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]

list2 = ["a", "c", "a", "b"]

those are indexes: [4, 6, 7, 9]

Output

The output should be an array of indexes never used

for that above example it would be [2, 8] (because it never appears in the indexes)

I tried a whole amount of solutions and all I could do is determine whether list2 is a subsequence of list1 or not

CodePudding user response：

The following python code does the job I think. It is not necessarily optimal and uses only basic python capabilities.

target = [ "a", "c", "b", "a", "b", "c", "a", "c", "b" ]
template = [ "a", "c", "a", "b" ]

def getEmbedded(target, relNdx, template):
    embeddeds = []
    if len(template) == 1:
        #  If the template list is only one element long, then check if any matches in the target list.
        for i in range(len(target)):
            if template[0] == target[i]:
                embeddeds.append([ relNdx   i ])
    else:
        # If the first element of the template matches an element in the target,
        #   then check if the remainder of the template is embedded in the remainder of the target.
        for i in range(len(target) - len(template)):
            if template[0] == target[i]:
                rtn = getEmbedded(target[i 1:], relNdx   i   1, template[1:])
                for subSeq in rtn:
                    #  If any matches, then build up each match with were we are now.
                    embeddeds.append( [ relNdx   i ]   subSeq )
    return embeddeds

embeddeds = getEmbedded(target, 0, template)

usedNdxes = []
for i in range(len(target)):
    usedNdxes.append(False)
for embedded in embeddeds:
    for ndx in embedded:
        usedNdxes[ndx] = True
        
for i in range(len(usedNdxes)):
    if not usedNdxes[i]:
        print( i )

CodePudding user response：

I see I wrote a solution in parallel with petern0691, which seems to be very similar. I'll still add it for what its worth as it uses a bit more python libraries, like list comprehension, sets, etc.

I think the key for making an understandable solution for such problems is to use recursive algorithm as follows:

target = ["a", "c", "b", "a", "b", "c", "a", "c", "b"]
template = ["a", "c", "a", "b"]

solutions = []


def find_indices_in_list(list_in, start_index, value):
    if start_index>=len(list_in):
        return []
    return [i start_index for i, val in enumerate(list_in[start_index:]) if val == value]


def find_pattern(target_index_stack, target_start_index):
    if (len(target_index_stack) == len(template)):
        # solution found
        solutions.append(target_index_stack)
        return
    template_index = len(target_index_stack)
    next_letter = template[template_index]

    matching_target_indices = find_indices_in_list(target, target_start_index, next_letter)
    for target_index in matching_target_indices:
        find_pattern(target_index_stack   [target_index], target_index 1)


find_pattern([],0)
print(solutions)

used_target_indices = set([target_index for solution in solutions for target_index in solution])
unused_target_indices = set(range(len(target))) - used_target_indices
print(unused_target_indices)

What happens above is that you chop your problem into pieces as follows:

find the first letter of the template within the target ('a')
for every occurrence of this letter in the target, you want to find the second letter of the template in the remainder of the target.
In other words, the remaining problem is the same as your main problem, but with only the last 3 letters of the template (["c", "a", "b"]), and with only the remaining part of your target (["c", "b", "a", "b", "c", "a", "c", "b"] for the first occurrence).
This is where the recursion comes into play: the `find_pattern()' function will call itself to solve the remainder of the solution.
You keep on recursing until one of the following occurs:
- there is 'no template left', which means you have found a solution, you store it, and you go back
- you are at the end of your target, and no solution was found on this path and you go back