I have been working with Python and numpy for a few weeks. And it was not until today that I realize that with
a = np.array([1,2,3])
b = np.array([[1,2], [3,4], [5,6]])
these 2 computations give the same result
a @ b
b.T @ a
even though the first one doesn't make sense in algebra (about the dimension).
So my question is, how is the algorithm of .dot working in the first computation? Or how is numpy consider 1-D and N-D arrays?
CodePudding user response:
You are possibly not asking about
np.dot
, that has different broadcasting rules.Because both of your examples involve the
@
operator, that is syntax sugar fornp.matmul
, I'll answer your question in terms ofnp.matmul
.
The answer is as simple as quoting the documentation of np.matmul
The behavior depends on the arguments in the following way.
- If both arguments are 2-D they are multiplied like conventional matrices.
- If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
- If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.
(emphasis is mine).
CodePudding user response:
a = np.array([1,2,3])
b = np.array([[1,2], [3,4], [5,6]])
With 1d and 2d arrays, dot
and matmul
do the same thing, though the documentation wording is a bit different.
Two cases from dot
:
- If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
the last axis of `a` and `b`.
- If `a` is an N-D array and `b` is an M-D array (where ``M>=2``), it is a
sum product over the last axis of `a` and the second-to-last axis of `b`::
Your a
is (3,), and b
is (3,2):
In [263]: np.dot(b.T,a)
Out[263]: array([22, 28])
This first applies, (2,3) with (3,) -> sum product over the shared size 3 dimension.
In [264]: np.dot(a,b)
Out[264]: array([22, 28])
The second applies, a (3,) with a (3,2) -> sum product of last of the (3,) and 2nd-to-the-last of (3,2), again the shared 3.
"Last of A, with the 2nd to the last of B" is the basic matrix multiplication rule. In only needs a tweak when B
is 1d, and doesn't have a 2nd-to-the-last.
matmul
rules are stated in terms adding a dimension, and later removing it.
- If the first argument is 1-D, it is promoted to a matrix by
prepending a 1 to its dimensions. After matrix multiplication
the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by
appending a 1 to its dimensions. After matrix multiplication
the appended 1 is removed.
(3,) with (3,2) => (1,3) with (3,2) => (1,2) => (2,)
(2,3) with (3,) => (2,3) with (3,1) => (2,1) => (2,)