Word Ladder Problem with mandatory words in the middle-CodePudding

I'm working on an question which is about Word Ladder with some extra constraints (A list of words is given and the ladder is only constructed from these words, not the entire English language). There are 2 constraints in addition to the conventional Word Ladder problem:

minimize the alphabetic distance of the word ladder.

For example, "aab" and "aaa" have a distance of 1, since "b" is the second letter of the alphabet, and "a" is the first. "bbb" and "bwb" have a distance of 21, since "b" is the second letter of the alphabet, and "w" is the twenty-third.

we need to use one of a set of particular words in the ladder, say "detour".

This means that, given a list of words (>= 1) from the original words, at least one of these words must appear in the word ladder.

There is a complexity constraint on this question, which is O(Elog(W) Wlog(W))), where

E is the number of pairs of words which differ by exactly one letter

W is the total number of words

Note that this complexity does not involve any term related to the size of "detour" words.

# One example is (we denote the word as indices in the list):
words = [’aaa’,’bbb’,’bab’,’aaf’,’aaz’,’baz’,’caa’,’cac’,’dac’,’dad’,’ead’,’eae’,’bae’,’abf’,’bbf’]
start word = 0         # ("aaa")
end word = 1           # ("bbb")
detour = [12]          # ("bae")
# The result path is: [0, 6, 7, 8, 9, 10, 11, 12, 2, 1]
# corresponding to the list of the words: [’aaa’, ’caa’, ’cac’, ’dac’, ’dad’, ’ead’, ’eae’, ’bae’, ’bab’, ’bbb’]

Now my implementation is just run Dijkstra’s algorithm, which has a complexity of O(Elog(W)), on every combinations from the start word to one of the "detour" word, and from that detour word to the final target word. Calculate all and find one with the minimum alphabetic distance.

This is not good I think, I'm not sure whether it exceeds the time complexity. Simultaneously, I cannot understand why the complexity doesn't involve the size of the "detour" words. How to make sure the path is the shortest while it contains at least one of the "detour" words (and meet the complexity)?

Does anyone have better idea to solve this problem? Thank you very much.

CodePudding user response：

This can be done in exactly the same time complexity as Dijkstra's algorithm using a small augmentation. We replace every vertex v in the original graph with two vertices (v, 0) and (v, 1), denoting whether we've visited a detour yet or not. We're now searching for the shortest path from (start, x) to (end, 1), where x is 0 or 1 if start is or is not a detour, respectively.

The priority queue Q initializes with only (start, x) at a priority/distance of 0. The main loop over neighbors in Dijkstra's algorithm transforms into this (pseudocode):

while Q is not empty:
    (v, has_detour) <- extract_min_heap(Q)

    for neighbor, edge_cost in neighbors[v]:
        detour_status = has_detour | is_detour(neighbor)
        alt_cost = dist[v, has_detour]   edge_cost

        if alt_cost < dist[neigh, detour_status]:
            dist[neigh, detour_status] = alt_cost
            predecessor[neigh, detour_status] = (v, has_detour)
            add (neigh, detour_status) to Q with priority == alt_cost

Note that we don't construct the augmented graph G* explicitly through the original edge set, only implicitly through the auxiliary data structure of Dijkstra's. Of course, you could actually store the new graph, which is defined like so:

Given a directed graph G = (V, E),
Define a new digraph G* = (V*, E*):

Vertex set V* := V x {0,1}

Edge set E* := {((v,1), (w,1)) | (v,w) ∈ E} 
             ∪ {((v,0), (w,0)) | (v,w) ∈ E and w ∉ detours} 
             ∪ {((v,0), (w,1)) | (v,w) ∈ E and w ∈ detours}

Since our new graph (on which we actually run Dijkstra's) has 2|V| vertices and 2|E| edges, the new asymptotic runtime and space complexity is the same as it was in the original implementation of Dijkstra's. Make sure to use a set or hashmap-based data structure for implementing is_detour() in O(1).

For a full Python implementation, see below. The code changes related to detours are almost all in that main loop: the majority of the code is building the standard word ladder graph. Building the graph can take |V|^2 * word_len or |V|*(word_len^2) |E|, depending on how you do it.

def word_ladder(words: List[str],
                start_word_idx: int,
                end_word_idx: int,
                detour_idxs: List[int]) -> Optional[List[int]]:
    """Given a list of distinct equal length lowercase words,
      find a word ladder of minimum pairwise alphabetic distance
      from start to end if one exists, or return None otherwise"""

    def letter_dist(letter1: str, letter2: str) -> int:
        return abs(ord(letter1) - ord(letter2))

    detour_idx_set = set(detour_idxs)
    word_bases = collections.defaultdict(list)

    graph = collections.defaultdict(list)

    word_len = len(words[0])

    for i, word in enumerate(words):
        as_list = list(word)
        for j in range(word_len):
            old = as_list[j]
            as_list[j] = '0'
            word_base = ''.join(as_list)
            for other_word_idx in word_bases[word_base]:
                dist = letter_dist(old, words[other_word_idx][j])
                graph[i].append((other_word_idx, dist))
                graph[other_word_idx].append((i, dist))
            word_bases[word_base].append(i)
            as_list[j] = old

    distances = collections.defaultdict(lambda: math.inf)
    queue = []

    start_is_detour = 1 if start_word_idx in detour_idx_set else 0

    queue.append((0, start_word_idx, start_is_detour))
    distances[start_word_idx, start_is_detour] = 0

    parent = {}

    def reconstruct_ladder() -> List[int]:
        vert, detour_status = end_word_idx, 1
        path = []
        while (vert, detour_status) in parent:
            path.append(vert)
            vert, detour_status = parent[vert, detour_status]
        path.append(vert)
        path.reverse()
        return path

    while len(queue) != 0:
        distance, index, has_detour = heapq.heappop(queue)
        if distances[index, has_detour] != distance:
            continue

        for neighbor_idx, edge_cost in graph[index]:
            detour_status = has_detour | (neighbor_idx in detour_idx_set)

            if distance   edge_cost < distances[neighbor_idx, detour_status]:
                distances[neighbor_idx, detour_status] = distance   edge_cost
                parent[neighbor_idx, detour_status] = (index, has_detour)
                heapq.heappush(queue,
                               (distances[neighbor_idx, detour_status],
                                neighbor_idx, detour_status))

    if (end_word_idx, 1) not in parent:
        return None
    return reconstruct_ladder()

which, on your input, can be used like:

words = ['aaa', 'bbb', 'bab', 'aaf', 'aaz', 'baz', 'caa', 'cac', 'dac', 'dad', 'ead', 'eae', 'bae', 'abf', 'bbf']
start = 0
end = 1
detours = [12]

print(word_ladder(words=words, start_word_idx=start,
                  end_word_idx=end, detour_idxs=detours))

[0, 6, 7, 8, 9, 10, 11, 12, 2, 1]

Note that this implementation of Dijkstra doesn't use Decrease-Key, so the theoretical runtime is suboptimal. Of course, that's also true of most practical implementations; feel free to use a Fibonacci-Heap or similar if that's a concern.