Given a 3 x 3 matrix S
(mechanical stress tensor) of the form
| sxx sxy szx |
S = | sxy syy syz |
| szx syz szz |
I can obtain the 3 eigenvalues (principal stresses) from
import numpy as np
S = np.array([[sxx, sxy, szx],
[sxy, syy, syz],
[szx, syz, szz]])
e_val, e_vec = np.linalg.eig(S)
principal_stress = np.sort(e_val) # 3 principal components sorted from min to max
# with sxx=100, syy=-50, sxy=75, szz=szx=syz=0 the result is
# (-81.07, 0.0, 131.07)
Now, I have a large 2d array in which each column contains the 6 elements
(sxx
, syy
, szz
, sxy
, szx
, syz
) for any given "point of interest (poi)".
There can be hundreds of thousands columns (if not millions) in this array...
| sxx_1 sxx_2 ... sxx_n |
| syy_1 syy_2 ... syy_n |
S_by_poi = | szz_1 szz_2 ... szz_n |
| sxy_1 sxy_2 ... sxy_n |
| szx_1 szx_2 ... szx_n |
| syz_1 syz_2 ... syz_n |
What would be the fastest way to compute the 3 eigenvalues, sort them and put them into an array
P
, like this:
| pmin_1 pmin_2 ... pmin_n |
P_by_poi = | pmid_1 pmid_2 ... pmid_n |
| pmax_1 pmax_2 ... pmax_n |
Can I somehow avoid looping over the columns of S
in Python? That does not appear to be a good or performant solution ...
CodePudding user response:
np.linalg.eig
and np.sort
are slow for computing small matrices. While they might be a vectorized way to do that, I do not expect it to be fast because the implementation of np.linalg.eig
would not be efficient anyway in this specific case.
One efficient solution is to implement this in Numba or Cython based on the analytic formula of the eigenvalues of a 3x3 symmetric matrix. The sort can be efficiently done using sorting networks. The analytic formula can be computed by mathematical tools like Wolfram Alpha. The expression can then be factorized manually though compilers are able to apply a common sub-expression elimination. Note variables are complex numbers and I am not sure the eigenvalues are always real ones in your case. I assumed that because you want to sort them which require values to be real numbers (or imaginary ones). Here is the resulting implementation:
import numba as nb
@nb.njit
def sorted_eigenvalues(v):
assert v.shape == (3,3)
a, b, c, d, e, f = v[0,0], v[1,1], v[2,2], v[0,1], v[0,2], v[1,2]
assert v[1,0] == d and v[2,0] == e and v[2,1] == f
# Analytic eigenvalues solution of the 3x3 input matrix
tmp1 = -a**2 a*b a*c - b**2 b*c - c**2 - 3*d**2 - 3*e**2 - 3*f**2
tmp2 = 2*a**3 - 3*a**2*b - 3*a**2*c - 3*a*b**2 12*a*b*c - 3*a*c**2 9*a*d**2 9*a*e**2 - 18*a*f**2 2*b**3 - 3*b**2*c - 3*b*c**2 9*b*d**2 - 18*b*e**2 9*b*f**2 2*c**3 - 18*c*d**2 9*c*e**2 9*c*f**2 54*d*e*f
tmp3 = np.sqrt((4*tmp1**3 tmp2**2) 0j)
tmp4 = (tmp2 tmp3) ** (1/3)
tmp5 = 1/3*(a b c)
tmp6 = 1 1j*np.sqrt(3)
tmp7 = 1 - 1j*np.sqrt(3)
eigv1 = tmp4/(3*2**(1/3)) - (2**(1/3)*tmp1)/(3*tmp4) tmp5
eigv2 = (tmp6*tmp1)/(3*2**(2/3)*tmp4) - (tmp7*tmp4)/(6*2**(1/3)) tmp5
eigv3 = (tmp7*tmp1)/(3*2**(2/3)*tmp4) - (tmp6*tmp4)/(6*2**(1/3)) tmp5
# Assume the values are real ones and remove the FP rounding errors
eigv1 = np.real(eigv1)
eigv2 = np.real(eigv2)
eigv3 = np.real(eigv3)
# Sort the eigenvalues using a fast sorting network
eigv1, eigv2 = min(eigv1, eigv2), max(eigv1, eigv2)
eigv2, eigv3 = min(eigv2, eigv3), max(eigv2, eigv3)
eigv1, eigv2 = min(eigv1, eigv2), max(eigv1, eigv2)
return eigv1, eigv2, eigv3
This functions is about 50 times faster for computing one 3x3 symmetric matrix on my machine (it takes only 0.5 us to do the computation in Numba while the overhead of calling a Numba function is about 0.2-0.3 us).
The second step is to loop over the columns of the S_by_poi
array in a Numba function and call the above function. For better performance, I advise you to directly pass the parameter a, b, c, d, e, f
to the sorted_eigenvalues
function so to avoid the expensive creation of Numpy arrays (or even memory loads/stores). I expect this to be 100 times faster in a loop (since the Numba function call overhead is removed). You can even parallelize the computation using prange
and parallel=True
in Numba resulting in an even faster computation (several hundred times faster if not several thousand times faster on high-end computing servers)!