I have a specific matrix with dimensions (nz,ny,nx) and another matrix with dimensions (ny,nx). In this other matrix are specific values and for instance I want to sum all the points in this first 3-dimensional matrix at locations where the second matrix has a specific value.
I am doing the following:
kkw = np.squeeze(np.where(wz==wval))
which has (2,X) elements and when I now try to do:
serout = np.sum(dum[:,kkw],axis=(1,2))
I end with something that is completely wrong or even uses a lot of memory and gets killed in the end.
What is the correct way to broadcast/sum the values in the original matrix using another matrix for mask and a specific mask value to make this sum?
Here is the small example of the problem:
nz,ny,nx = 3,20,10
datain = np.random.random((nz,ny,nx))
mask = np.array(5 5.*np.random.random((ny,nx)),dtype='int');
kkw = np.squeeze(np.where(mask==5))
dataout = np.sum(datain[:,kkw],axis=(1,2))
So, in the end I would expect dataout
to have 3 elements, but it has dimensions (3,20). Clearly, I am not doing the broadcasting correctly.
CodePudding user response:
Your sample arrays:
In [165]: nz,ny,nx = 3,20,10
...: datain = np.random.random((nz,ny,nx))
...: mask = np.array(5 5.*np.random.random((ny,nx)),dtype='int');
...:
In [166]: datain.shape
Out[166]: (3, 20, 10)
In [167]: mask.shape
Out[167]: (20, 10)
Indices where mask
is 5:
In [170]: np.where(mask==5)
Out[170]:
(array([ 0, 0, 0, 0, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6,
6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 11, 12, 13, 13,
13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 19,
19]),
array([1, 3, 6, 9, 3, 6, 8, 9, 1, 5, 7, 0, 2, 3, 6, 9, 1, 6, 9, 8, 9, 3,
5, 1, 3, 4, 8, 0, 4, 9, 0, 2, 0, 3, 5, 6, 8, 2, 5, 9, 0, 1, 2, 5,
8, 0, 4, 6, 8, 9, 3, 7]))
I haven't seen squeeze
applied to this, but all it's doing is converting the tuple of arrays to a (2,52) array. np.argwhere
applies np.transpose
, producing a (52,2) array.
In [171]: np.squeeze(_)
Out[171]:
array([[ 0, 0, 0, 0, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5,
6, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 11, 12,
13, 13, 13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18,
18, 18, 19, 19],
[ 1, 3, 6, 9, 3, 6, 8, 9, 1, 5, 7, 0, 2, 3, 6, 9,
1, 6, 9, 8, 9, 3, 5, 1, 3, 4, 8, 0, 4, 9, 0, 2,
0, 3, 5, 6, 8, 2, 5, 9, 0, 1, 2, 5, 8, 0, 4, 6,
8, 9, 3, 7]])
The where
tuple is easier to use when indexing.
In [172]: I,J = np.where(mask==5)
In [173]: I.shape
Out[173]: (52,)
Applied to mask
itself we get all the 5
values:
In [174]: mask[I,J]
Out[174]:
array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5])
Applied to the 3d array:
In [175]: datain[:,I,J].shape
Out[175]: (3, 52)
In [176]: datain[:,I,J].sum(axis=1)
Out[176]: array([28.01960227, 26.66364387, 26.24709875])
datain[:,kkw]
produces a (3,2,52,10) array. It's applying the (2,52) array to index just the size 20 dimension. Converting the where
tuple to a 2d array does not help with this indexing. When indexing, the distinction between tuples, lists, and arrays is very important.