Will the interpreter optimise for you in numpy-CodePudding

Say that I have two m×n numpy matrices, A and B, and an m×1 numpy matrix, v. I want to evaluate A @ B.T @ x.

Evaluating left to right naively, I first have to form an m×m matrix and then multiply that by the vector x. However, any human or smart compiler would evaluate B.T @ x first to avoid forming the matrix.

I don't know if this is the case for Python. Will it naively form the matrix evaluating left to right? And if so, can I override this behaviour by using brackets to enforce an order of operations (as in result = A @ (B.T @ x))? Or will Python disregard these brackets upon seeing that the expression is equivalent without them?

CodePudding user response：

Will it naively form the matrix evaluating left to right?

Yes, the matrix multiply operator is left-associative, like most (but not all) of Python's operators.

The expression a @ b @ c is evaluated as ((a @ b) @ c), regardless of the size or type of a, b, and c.

The precedence/associativity is determined by the Python interpreter; libraries like NumPy cannot change it.

Or will Python disregard these brackets upon seeing that the expression is equivalent without them?

No. If you specify an order, Python always follows that order.

However, any human or smart compiler would evaluate B.T @ x first to avoid forming the matrix.

I also experimented with the Numba package. Numba is a library which invokes an optimizing compiler on NumPy-heavy code. I used fastmath=True, which allows it to reorder some math operations.

Code:

import numba as nb


@nb.njit(fastmath=True)
def mult_badorder_nb(A, B, x):
    return (A @ B) @ x
@nb.njit(fastmath=True)
def mult_goodorder_nb(A, B, x):
    return A @ (B @ x)

I could not get it to optimize mult_badorder_nb into mult_goodorder_nb. It seems like theoretically it has enough information to do this optimization, but it does not seem to actually do it.

Is @ associative in all situations?

No, and there are three cases to watch out for.

First, even if (A @ B) @ x is mathematically equivalent to A (B @ x), that does not mean that it is numerically equivalent when done on a computer with finite precision.

For example, here are two matrices that provide different results when multiplied in a different order. I found these through a random search.

import numpy as np


A = np.array([[6e-20, 4e-38], [3e-13, 2e-31]])
B = np.linalg.inv(A)
x = np.array([1, 1])
print((A @ B) @ x)
print(A @ (B @ x))

Output:

[0.99999993 0.29886705]
[ 1.00609092e 00 -1.88414573e 06]

Second, @ is not associative when a vector appears in the middle. In other words, if A and B are matrices, and x is a vector, then (A @ x) @ B is not equivalent to A @ (x @ B).

Third, if you are working with something other than NumPy arrays, then those types can override @ to do anything, and that might not be associative.

CodePudding user response：

Here are some time tests for the different orderings:

In [2]: n,m=1000,2000; A = np.ones((m,n)); B=A.copy(); x=np.ones((m,1))

In [3]: timeit [email protected]@x
832 ms ± 33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: timeit A@(B.T@x)
6.2 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

einsum can evaluate alternative evaluation orders:

In [5]: timeit np.einsum('ij,jk,kl->il',A,B.T,x, optimize=True)
3.64 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Without optimize it evaluates on all three variables at once, so it's not the same as ([email protected])@x.

In [6]: timeit np.einsum('ij,jk,kl->il',A,B.T,x, optimize=False)
29.2 s ± 2.3 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Here's the breakdown of computations for the optimal case:

In [7]: print(np.einsum_path('ij,jk,kl->il',A,B.T,x, optimize=True)[1])
  Complete contraction:  ij,jk,kl->il
         Naive scaling:  4
     Optimized scaling:  3
      Naive FLOP count:  1.200e 10
  Optimized FLOP count:  8.000e 06
   Theoretical speedup:  1500.000
  Largest intermediate:  2.000e 03 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   3                   kl,jk->lj                                ij,lj->il
   3                   lj,ij->il                                   il->il

There's another function that evaluates alternative orders:

In [11]: timeit np.linalg.multi_dot((A,B.T,x))
5.96 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note that these are functions, that do their own evaluations of alternatives. They don't depend on the interpreter default evaluation order.

Before @ was added as an operator, we had to write a nested dot or matmul:

In [13]: timeit np.dot(A,np.dot(B.T,x))
6.11 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)