Numpy.dot behaviour when multiplying an (m, ) vector with an (m, n) matrix-CodePudding

I have been working with Python and numpy for a few weeks. And it was not until today that I realize that with

a = np.array([1,2,3])
b = np.array([[1,2], [3,4], [5,6]])

these 2 computations give the same result

a @ b
b.T @ a

even though the first one doesn't make sense in algebra (about the dimension).

So my question is, how is the algorithm of .dot working in the first computation? Or how is numpy consider 1-D and N-D arrays?

CodePudding user response：

You are possibly not asking about np.dot, that has different broadcasting rules.
Because both of your examples involve the @ operator, that is syntax sugar for np.matmul, I'll answer your question in terms of np.matmul.

The answer is as simple as quoting the documentation of np.matmul

The behavior depends on the arguments in the following way.

If both arguments are 2-D they are multiplied like conventional matrices.

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.

If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.

(emphasis is mine).

CodePudding user response：

a = np.array([1,2,3])
b = np.array([[1,2], [3,4], [5,6]])

With 1d and 2d arrays, dot and matmul do the same thing, though the documentation wording is a bit different.

Two cases from dot:

- If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
  the last axis of `a` and `b`.

- If `a` is an N-D array and `b` is an M-D array (where ``M>=2``), it is a
  sum product over the last axis of `a` and the second-to-last axis of `b`::

Your a is (3,), and b is (3,2):

In [263]: np.dot(b.T,a)
Out[263]: array([22, 28])

This first applies, (2,3) with (3,) -> sum product over the shared size 3 dimension.

In [264]: np.dot(a,b)
Out[264]: array([22, 28])

The second applies, a (3,) with a (3,2) -> sum product of last of the (3,) and 2nd-to-the-last of (3,2), again the shared 3.

"Last of A, with the 2nd to the last of B" is the basic matrix multiplication rule. In only needs a tweak when B is 1d, and doesn't have a 2nd-to-the-last.

matmul rules are stated in terms adding a dimension, and later removing it.

- If the first argument is 1-D, it is promoted to a matrix by
  prepending a 1 to its dimensions. After matrix multiplication
  the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by
  appending a 1 to its dimensions. After matrix multiplication
  the appended 1 is removed.

(3,) with (3,2) => (1,3) with (3,2) => (1,2) => (2,)

(2,3) with (3,) => (2,3) with (3,1) => (2,1) => (2,)