How to utilize scalar multiplication in einsum?


In https://numpy.org/doc/stable/reference/generated/numpy.einsum.html it is mentioned that

Broadcasting and scalar multiplication: np.einsum('..., ...', 3, c) array([[ 0, 3, 6],[ 9, 12, 15]])

it seems einsum can mimick prefactors alpha/beta in DGEMM http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html

Does it imply that it (include scalar multiplication inside einsum as one step) will be faster than two steps: (1) A,B->C and (2) C*prefactor?

I tried to extend https://ajcr.net/Basic-guide-to-einsum/ as

    import numpy as np
    A = np.array([0, 1, 2])
    B = np.array([[ 0,  1,  2,  3],  [ 4,  5,  6,  7], [ 8,  9, 10, 11]])
    C = np.einsum('i,ij->i', 2., A, B)

and got ValueError: einstein sum subscripts string contains too many subscripts for operand.

So, my question is, is there any method to include scalar factor inside einsum and accelerate the calculation?

I haven't used this scalar feature, but here's how it works:

In [422]: np.einsum('i,ij->i',A,B)
Out[422]: array([ 0, 22, 76])

In [423]: np.einsum(',i,ij->i',2,A,B)
Out[423]: array([  0,  44, 152])

The time savings appears to be minor

In [424]: timeit np.einsum(',i,ij->i',2,A,B)
11.5 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [425]: timeit 2*np.einsum('i,ij->i',A,B)
12.3 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

another example:

In [427]: np.einsum(',i,,ij->i',3,A,2,B)
Out[427]: array([  0, 132, 456])
