Two implementation methods of BFS for finding the shortest path, which one is the obvious winner?-CodePudding

There are two ways of implementing BFS to find the shortest path between two nodes. The first is by using a list of lists to represent the queue of paths. Another is to maintain a mapping from each node to its parent, and when inspecting the adjacent node, record its parent, finally do backtrace according to the parent mapping to find the path. (See this post for more details. https://stackoverflow.com/a/8922151/13886907. Thanks for qiao's answers and codes to that question!)

Copied them here: First way:

def bfs(graph, start, end):
    # maintain a queue of paths
    queue = []
    # push the first path into the queue
    queue.append([start])
    while queue:
        # get the first path from the queue
        path = queue.pop(0)
        # get the last node from the path
        node = path[-1]
        # path found
        if node == end:
            return path
        # enumerate all adjacent nodes, construct a 
        # new path and push it into the queue
        for adjacent in graph.get(node, []):
            new_path = list(path)
            new_path.append(adjacent)
            queue.append(new_path)

print(bfs(graph, 'A', 'F'))

Second way:

def backtrace(parent, start, end):
    path = [end]
    while path[-1] != start:
        path.append(parent[path[-1]])
    path.reverse()
    return path
        

def bfs(graph, start, end):
    parent = {}
    queue = []
    queue.append(start)
    while queue:
        node = queue.pop(0)
        if node == end:
            return backtrace(parent, start, end)
        for adjacent in graph.get(node, []):
            if node not in queue :
                parent[adjacent] = node # <<<<< record its parent 
                queue.append(adjacent)

print(bfs(graph, 'A', 'F'))

and the graph (directed graph)

graph = {'A': ['C', 'D', 'B'],
        'B': ['C', 'E'],
        'C': ['E'],
        'D': ['F'],
        'E': ['F']}

We can see that the second way can save the memory cost since the queue does not need to store the paths, and the space complexity for both the queue and parent map is O(V), where V is the number of vertices. And also, the final backtrace process costs at most extra O(V) time.

So, does the second way is superior in all aspects to the first way in finding the shortest or all paths between two nodes in a directed graph? Can we think of the second as an optimization of the basic version of BFS (first way)?

CodePudding user response：

The second version is superior. This is because the memory allocation also costs time. Namely this line:

new_path = list(path)

...has a time complexity of O(k) in terms of the length of path. Even in the best case, where the graph is actually just a single path from source to target node, the first code will spend O(1) O(2) O(3) ... O(n) on the executions of this list(path) call, which is O(n²). The second version will be O(n) in this "happy path" case. Things only get worse when the branching factor in the graph becomes greater.

Remark on your code

Both code snippets have issues:

The first version does not have protection against running in loops. You should add a visited-marker so that the same node is not visited twice
The second version seems to have such protection, but it is not good enough. It checks whether the next node is already in the queue. But if even it is not, it could have previously been, and also in that case, it should not be revisited. We can use parent to know whether the node was already visited.

So here are the corrected snippets:

def bfs_1(graph, start, end):
    queue = []
    visited = set()
    queue.append([start])
    visited.add(start)
    while queue:
        path = queue.pop(0)
        node = path[-1]
        if node == end:
            return path
        for adjacent in graph.get(node, []):
            if adjacent not in visited:
                visited.add(adjacent)
                new_path = list(path)
                new_path.append(adjacent)
                queue.append(new_path)

And

def backtrace(parent, start, end):
    path = [end]
    while path[-1] != start:
        path.append(parent[path[-1]])
    path.reverse()
    return path
        

def bfs_2(graph, start, end):
    parent = {}
    queue = []
    queue.append(start)
    parent[start] = None
    while queue:
        node = queue.pop(0)
        if node == end:
            return backtrace(parent, start, end)
        for adjacent in graph.get(node, []):
            if adjacent not in parent:
                parent[adjacent] = node # <<<<< record its parent 
                queue.append(adjacent)

Comparison

I used the following testing code to test the above algorithms:

import random
from timeit import timeit

def create_graph(size):
    graph = {}
    nodes = list(range(size))
    for i in nodes:
        graph[i] = set(random.choices(nodes, k=3))
        if i in graph[i]:
            graph[i].remove(i)
        graph[i] = list(graph[i])
    return graph


graph = create_graph(40000)
print("version 1")
print(bfs_1(graph, 1, 2))
print("time used", timeit(lambda: bfs_1(graph, 1, 2), number=10))
print()
print("version 2")
print(bfs_2(graph, 1, 2))
print("time used", timeit(lambda: bfs_2(graph, 1, 2), number=10))

See it run on repl.it

The generated graph has 100 000 nodes, with a branching factor of approximately 2. The edges are random. Most of the time the second algorithm is faster than the first. This difference becomes more explicit when the solution path is longer.