I have a DAG that represents multiple arbitrary boolean expressions. DAG consists of two types of the nodes:
- Leaf node (predicate that consists of attribute, operator, value(s))
- Intermediate node (AND/OR)
Each node in the graph can be reused in other expressions as subexpression.
To compute an OR
/AND
node for example, it's predicates must be computed first (or intermediate nodes).
Is there any optimal algorithm that can provide this sort of bottom-up, level-order traversal for DAG staring with predicate nodes?
CodePudding user response:
What you are looking for is topological sort
. Let me write the pseudocode with explanations for you. The idea is that if there is a path from u
to v
, then v
appears after u
in the ordering.
In both implementations, we'll create a data structure named output
and append the vertices one by one to it. Eventually, output
will give us the topological order
.
First implementation:
- Compute the indegrees of all vertices.
indegree of a vertex is the number of head ends adjacent to that vertex. - Find a vertex
u
with indegree = 0, and store it in theoutput
. If there is no such vertex, then it means that graph is not acyclic, therefore there is no topological order. - Remove
u
and all its edges from the graph. - Update the indegrees of the remaining vertices if necessary.
- Repeat the steps 2,3 and 4
V
times,V
being the number of vertices in the graph.
Time complexity of this implementation isO(V^2)
.
More efficient implementation:
- Compute the indegrees of all vertices.
- Store all vertices with indegree 0 in an arbitrary data structure*
A
and store all vertices with positive indegree in another arbitrary data structureB
. - Remove vertex
u
from theA
and store it inoutput
. - For all edges
(u, v)
, update the indegree ofv
. If indegree ofv
is 0, take it fromB
and put it to theA
. - Repeat steps 3 and 4 until
A
andB
become empty.
Time complexity of this implementation isO(V E)
,E
being the number of edges in the graph.
*= Any data structure that allows adding and removing in O(1)
(constant time) will be ok.
Notice that topological order
may not be unique. For example there is no hierarchy between attr_2>=200
and attr_3 IN ["yellow", "green"]
nodes in your graph. So their orders respective to each other may vary.