There are two ways of implementing BFS to find the shortest path between two nodes. The first is by using a list of lists to represent the queue of paths. Another is to maintain a mapping from each node to its parent, and when inspecting the adjacent node, record its parent, finally do backtrace according to the parent mapping to find the path. (See this post for more details. https://stackoverflow.com/a/8922151/13886907. Thanks for qiao's answers and codes to that question!)
Copied them here: First way:
def bfs(graph, start, end):
# maintain a queue of paths
queue = []
# push the first path into the queue
queue.append([start])
while queue:
# get the first path from the queue
path = queue.pop(0)
# get the last node from the path
node = path[-1]
# path found
if node == end:
return path
# enumerate all adjacent nodes, construct a
# new path and push it into the queue
for adjacent in graph.get(node, []):
new_path = list(path)
new_path.append(adjacent)
queue.append(new_path)
print(bfs(graph, 'A', 'F'))
Second way:
def backtrace(parent, start, end):
path = [end]
while path[-1] != start:
path.append(parent[path[-1]])
path.reverse()
return path
def bfs(graph, start, end):
parent = {}
queue = []
queue.append(start)
while queue:
node = queue.pop(0)
if node == end:
return backtrace(parent, start, end)
for adjacent in graph.get(node, []):
if node not in queue :
parent[adjacent] = node # <<<<< record its parent
queue.append(adjacent)
print(bfs(graph, 'A', 'F'))
and the graph (directed graph)
graph = {'A': ['C', 'D', 'B'],
'B': ['C', 'E'],
'C': ['E'],
'D': ['F'],
'E': ['F']}
We can see that the second way can save the memory cost since the queue does not need to store the paths, and the space complexity for both the queue and parent map is O(V), where V is the number of vertices. And also, the final backtrace process costs at most extra O(V) time.
So, does the second way is superior in all aspects to the first way in finding the shortest or all paths between two nodes in a directed graph? Can we think of the second as an optimization of the basic version of BFS (first way)?
CodePudding user response:
The second version is superior. This is because the memory allocation also costs time. Namely this line:
new_path = list(path)
...has a time complexity of O(k) in terms of the length of path
. Even in the best case, where the graph is actually just a single path from source to target node, the first code will spend O(1) O(2) O(3) ... O(n) on the executions of this list(path)
call, which is O(n²). The second version will be O(n) in this "happy path" case. Things only get worse when the branching factor in the graph becomes greater.
Remark on your code
Both code snippets have issues:
The first version does not have protection against running in loops. You should add a visited-marker so that the same node is not visited twice
The second version seems to have such protection, but it is not good enough. It checks whether the next node is already in the queue. But if even it is not, it could have previously been, and also in that case, it should not be revisited. We can use
parent
to know whether the node was already visited.
So here are the corrected snippets:
def bfs_1(graph, start, end):
queue = []
visited = set()
queue.append([start])
visited.add(start)
while queue:
path = queue.pop(0)
node = path[-1]
if node == end:
return path
for adjacent in graph.get(node, []):
if adjacent not in visited:
visited.add(adjacent)
new_path = list(path)
new_path.append(adjacent)
queue.append(new_path)
And
def backtrace(parent, start, end):
path = [end]
while path[-1] != start:
path.append(parent[path[-1]])
path.reverse()
return path
def bfs_2(graph, start, end):
parent = {}
queue = []
queue.append(start)
parent[start] = None
while queue:
node = queue.pop(0)
if node == end:
return backtrace(parent, start, end)
for adjacent in graph.get(node, []):
if adjacent not in parent:
parent[adjacent] = node # <<<<< record its parent
queue.append(adjacent)
Comparison
I used the following testing code to test the above algorithms:
import random
from timeit import timeit
def create_graph(size):
graph = {}
nodes = list(range(size))
for i in nodes:
graph[i] = set(random.choices(nodes, k=3))
if i in graph[i]:
graph[i].remove(i)
graph[i] = list(graph[i])
return graph
graph = create_graph(40000)
print("version 1")
print(bfs_1(graph, 1, 2))
print("time used", timeit(lambda: bfs_1(graph, 1, 2), number=10))
print()
print("version 2")
print(bfs_2(graph, 1, 2))
print("time used", timeit(lambda: bfs_2(graph, 1, 2), number=10))
See it run on repl.it
The generated graph has 100 000 nodes, with a branching factor of approximately 2. The edges are random. Most of the time the second algorithm is faster than the first. This difference becomes more explicit when the solution path is longer.