How to vectorize a 2 level loop in NumPy-CodePudding

Based on the comments, I have revised the example:

Consider the following code

import numpy as np

def subspace_angle(A, B):
    M = A.T @ B
    s = np.linalg.svd(M, compute_uv=False)
    return s[0]

def principal_angles(bases):    
    k = bases.shape[0]
    r = np.zeros((k, k))
    for i in range(k):
        x = bases[i]
        r[i, i] = subspace_angle(x, x)
        for j in range(i):
            y = bases[j]
            r[i, j] = subspace_angle(x, y)
            r[j, i] = r[i, j]
    r = np.minimum(1, r)
    return np.rad2deg(np.arccos(r))

Following is an example use:

bases = []
# number of subspaces
k = 5
# ambient dimension
n = 8
# subspace dimension
m = 4
for i in range(5):
    X = np.random.randn(n, m)
    Q,R = np.linalg.qr(X)
    bases.append(Q)
# combine the orthonormal bases for all the subspaces
bases = np.array(bases)
# Compute the smallest principal angles between each pair of subspaces.
print(np.round(principal_angles(bases), 2))

Is there a way to avoid the two-level for loops in the principal_angles function, so that the code could be sped up? As a result of this code, the matrix r is symmetric. Since subspace_angle could be compute-heavy depending on the array size, it is important to avoid computing it twice for r[i,j] and r[j,i].

On the comment about JIT, actually, I am writing the code with Google/JAX. The two-level loop does get JIT compiled giving performance benefits. However, the JIT compilation time is pretty high (possibly due to two levels of for-loop). I am wondering if there is a better way to write this code so that it may compile faster.

CodePudding user response：

I started to copy your code to ipython session, getting a (5,8,4) shaped bases. But then realized that func is undefined. So by commenting that out, I get:

In [6]: def principal_angles(bases):
   ...:     k = bases.shape[0]
   ...:     r = np.zeros((k, k))
   ...:     for i in range(k):
   ...:         x = bases[i]
   ...:         # r[i, i] = func(x, x)
   ...:         for j in range(i):
   ...:             y = bases[j]
   ...:             r[i, j] = subspace_angle(x, y)
   ...:             #r[j, i] = r[i, j]
   ...:     return r
   ...:     #r = np.minimum(1, r)
   ...:     #return np.rad2deg(np.arccos(r))
   ...: 
In [7]: r=principal_angles(bases)
In [8]: r.shape
Out[8]: (5, 5)

Since both matmul and svd can work with higher dimensions, i.e. batches, I wonder if it's possible to call subspace_angle with all bases, rather than iteratively.

We have to think carefully about what shapes we pass it, and how they evolve.

def subspace_angle(A, B):
    M = A.T @ B
    s = np.linalg.svd(M, compute_uv=False)
    return s[0]

(Oops, my os just crashed the terminal, so I'll have get back to this later.)

So A and B are (8,4), A.T is (4,8), and A.T@B is (4,4)

If they were (5,8,4), A.transpose(0,2,1) would be (5,4,8), and M would be (5,4,4).

I believe np.linalg.svd accepts that M, returning a (5,4,4)

In [29]: r=principal_angles(bases)
In [30]: r
Out[30]: 
array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.99902153, 0.        , 0.        , 0.        , 0.        ],
       [0.99734371, 0.95318936, 0.        , 0.        , 0.        ],
       [0.99894054, 0.99790422, 0.87577343, 0.        , 0.        ],
       [0.99840093, 0.92809283, 0.99896121, 0.98286429, 0.        ]])

Let's try that with the whole bases. Use broadcasting to get the 'outer' product on the first dimension:

In [31]: M=bases[:,None,:,:].transpose(0,1,3,2)@bases
In [32]: r1=np.linalg.svd(M, compute_uv=False)
In [33]: M.shape
Out[33]: (5, 5, 4, 4)
In [34]: r1.shape
Out[34]: (5, 5, 4)

To match your s[0] I have to use (need to review the svd docs):

In [35]: r1[:,:,0]
Out[35]: 
array([[1.        , 0.99902153, 0.99734371, 0.99894054, 0.99840093],
       [0.99902153, 1.        , 0.95318936, 0.99790422, 0.92809283],
       [0.99734371, 0.95318936, 1.        , 0.87577343, 0.99896121],
       [0.99894054, 0.99790422, 0.87577343, 1.        , 0.98286429],
       [0.99840093, 0.92809283, 0.99896121, 0.98286429, 1.        ]])

time savings aren't massive, but may be better if the first dimension is larger than 5:

In [36]: timeit r=principal_angles(bases)
320 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [37]: %%timeit
    ...: M=bases[:,None,:,:].transpose(0,1,3,2)@bases
    ...: r1=np.linalg.svd(M, compute_uv=False)[:,:,0]
    ...: 
    ...: 
190 µs ± 450 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This may be enough to get you started with a more refined "vectorization".

CodePudding user response：

After some more thinking and experimenting with np.triu_indices function, I have come up with the following solution which avoids extra unnecessary computation.

def vectorized_principal_angles(subspaces):
    # number of subspaces
    n = subspaces.shape[0]
    # Indices for upper triangular matrix
    i, j = np.triu_indices(n, k=1)
    # prepare all the possible pairs of A and B
    A = subspaces[i]
    B = subspaces[j]
    # Compute the Hermitian transpose of each matrix in A array
    AH = np.conjugate(np.transpose(A, axes=(0,2,1)))
    # Compute M = A^H B for each matrix pair
    M = np.matmul(AH, B)
    # Compute the SVD for each matrix in M
    s = np.linalg.svd(M, compute_uv=False)
    # keep only the first singular value for each M
    s = s[:, 0]
    # prepare the result matrix
    # It is known in advance that diagonal elements will be 1.
    r = 0.5 * np.eye(n)
    r[i, j] = s
    # Symmetrize the matrix
    r = r   r.T
    # Final result
    return r

Here is what is going on:

np.triu_indices(k, k=1) gives me indices for n(n-1)/2 pairs of possible combinations of matrices.
All the remaining computation is limited to the n(n-1)/2 pairs only.
Finally, the scalar values array is put back into a square symmetric result matrix

Thank you @hpaulj for your solution. It helped me a lot in getting the right direction.