Imagine we have a tree of values stored in a NumPy array. For example -
In [1]: import numpy as np
In [2]: tree = np.array([[0, 6], [0, 4], [1, 3], [2, 9], [3, 1], [2, 7]]);
In [3]: tree.shape
Out[3]: (6, 2)
Each node in the tree is row in the array. The first row tree[0]
is the root node [0, 6]
. The first column tree[:,0]
contains the row number of the node's parent and the second column tree[:,1]
contains the node's value attribute.
What is most efficient way to access the value attributes of a given node up to root via its ancestors? For example, for the sixth node [2, 7]
, this would be [7, 3, 4, 6]
One method is to recursively read the array from the starting node up using the first column as an index for the next ancestor, for example -
In [20]: i = 5
...: values = []
...: while i > 0:
...: values.append(tree[i, 1])
...: i = tree[i, 0]
...: values.append(tree[0, 1])
...: print(values)
[7, 3, 4, 6]
but I found this to be slow for large complex trees. Is there a faster way?
Background to my question - I am trying to implement the Monte Carlo tree search (MCTS)
CodePudding user response:
Not sure it could help. It depends on how often you do this operation, and also how big and how deep your tree is.
But basically, my suggestion, if you need to accelerate this, would be "precompute every thing for every node", just, then, you can do it numpy's style:
preList=[tree]
idx=tree[:,0]
while (preList[-1][:,0]!=0).any():
preList.append(preList[-1][idx])
pre=np.stack(preList)
# Values of 6th node
pre[:,5][:,1]
# array([7, 3, 4, 6])
Note that it would always give 4 values, repeating root value if needed. But you can stop at the first pre[:,5][:,0] that is 0 (root).
This is just the same thing you are doing (from a row #i = [j,v], getting the parent row #j). Just done once for all on all nodes. To get a 3D matrix, whose 1st axis is the ancestor axis
Note that if your computation timing were unbearable with your current algorithm, meaning that your tree is very deep, then, chances are that mine would suffer from heavy memory usage, since my 3d matrix has the size of your 2d matrix times the tree depth.
For cpu usage tho, even if, from a strictly "number of operations point of view" what I do is just precompute your algorithm for all nodes in the tree, it probably worth it even if you don't need that much computation, because obviously, numpy array indexation is faster.
With a tree as small as yours, it takes 46 such request before my method is cheaper than yours (it takes 46 requests to absorb cost of precomputation). which is not good, considering that you have only 6 nodes.
But for a 13 nodes tree, precomputation time is 76μs, your code needs 3.12 μs/request, mine 350 ns. So number of request before it worth it drops to 27. Still more than number of nodes (13).
For a 27 nodes tree, precomputation time is 84μs, your code needs 3.81 μs/request, mine still 350 ns. So number of requests for which precomputation is profitable is 24.
In CPU time, precomputation is O(n.log(n)), your code is O(log(n)). And my request is O(1). So, in theory, if number of requests is k, that is O(n.log(n) k) on my side, and O(klog(n)) on yours. Which becomes equivalent if k~n. As I said, it is just as calling your code on all possible nodes. But because of numpy efficiency, precomputation costs less that calling your code n times. So, it is worthy even if k is smaller than n.
CodePudding user response:
For iterative operation like this, Numpy does not provide any (efficient) vectorization functions. A solution to speed this up is to use Numba (a JIT compiler) and return a Numpy array (since Numba can operate more efficiently on them). Here is an example:
import numba as nb
import numpy as np
@nb.njit(['(int16[:,:], int_)', '(int32[:,:], int_)', '(int64[:,:], int_)'])
def compute(tree, i):
values = np.empty(max(tree.shape[0], 1), dtype=tree.dtype)
cur = 0
while i > 0:
assert cur < values.size
values[cur] = tree[i, 1]
i = tree[i, 0]
cur = 1
assert cur < values.size
values[cur] = tree[0, 1]
return values[:cur 1] # Consider using copy() if cur << tree.shape[0]
print(compute(tree, 5))
It takes 0.76 us on my machine as opposed to 1.36 us for the initial code. However, ~0.54 us are spent in calling the JIT and checking the input parameter and 0.1~0.2 us are spent in the allocation of the output array. Thus, basically 90% of the time of the Numba function is a constant overhead. It should be much faster for large trees. If you have many small trees to compute, then you can call it from a Numba function so to avoid the overhead of calling a JIT function from the slow CPython interpreter. When called from a JIT function, the above function takes only 0.063 us on the input example. Thus, the Numba function can be up to 22 times faster in this case.
Note that it is better to use a small datatype for the tree since random accesses are expensive in large arrays. The smaller the array in memory, the more likely it can fit in CPU caches, the faster the computation. For trees with less than 65536 items, it is safe to use a uint16
datatype (while the default one is int32
on Windows and int64
on Linux, that is, respectively 2 and 4 times bigger).