What's a good way of setting most elements of an ndarray to zero?-CodePudding

I've got an ndarray with, say, 10,000 rows and 75 columns, and another one with the same number of rows and, say, 3 columns. The second one has integer values.

I want to end up with an array of 10,000 rows and 75 columns with all the elements set to zero except the elements in each row indexed by the values in the corresponding row of the second array.

So starting with z_array and i_array, I want to end up with a_array

>>> z_array
array([[10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15]])
>>> i_array
array([[0, 2],
       [3, 1],
       [1, 4],
       [2, 3]])
>>> a_array
array([[10,  0, 12,  0,  0,  0],
       [ 0, 11,  0, 13,  0,  0],
       [ 0, 11,  0,  0, 14,  0],
       [ 0,  0, 12, 13,  0,  0]])

I can see two ways of approaching this: either start with an array full of zeros and copy across the relevant elements from z_array; or start with z_array and set all the irrelevant elements to zero. Note that the number of irrelevant elements is typically much, much larger than the number of relevant elements.

Either way, is there a good way of doing the multiple assignments, or do I simply have to loop through them? Or is there a third approach?

I'm wondering if I can use numpy.ufunc.at somehow? I can see how to get a list of indexes for the relevant elements, for example

>>> index_list = [[i, val] for (i, x) in enumerate(i_array) for val in x ]
index_list
[[0, 0], [0, 2], [1, 3], [1, 1], [2, 1], [2, 4], [3, 2], [3, 3]]

And there's a slightly more complex way to get them for the irrelevant elements. But these lists would be big!!

CodePudding user response：

You could use masked arrays

import numpy as np

def mask_array(z_array, i_array):
    ROWS, COLS = np.shape(z_array)

    # Fill in the mask
    mask = np.zeros((ROWS, COLS))
    for i in range(ROWS):
        np.add.at(mask[i,:], i_array[i], 1)
    mask = mask > 0
    mask = ~mask

    m_array = np.ma.array(z_array, mask=mask, fill_value = 0)
    return np.ma.filled(m_array)

a_array = mask_array(z_array, i_array)

CodePudding user response：

To tell apart elements of z_array, I defined it as:

array([[ 10,  11,  12,  13,  14,  15],
       [110, 111, 112, 113, 114, 115],
       [210, 211, 212, 213, 214, 215],
       [310, 311, 312, 313, 314, 315]])

Then one of possible solutions is to create an array filled with zeroes, using zeros_like and then run a loop based on ndenumerate method:

result = np.zeros_like(z_array)
for (r, c), x in np.ndenumerate(i_array):
    result[r, x] = z_array[r, x]

For my (changed) source data, the result is:

array([[ 10,   0,  12,   0,   0,   0],
       [  0, 111,   0, 113,   0,   0],
       [  0, 211,   0,   0, 214,   0],
       [  0,   0, 312, 313,   0,   0]])

CodePudding user response：

It seems like you are looking for something similar to np.put_along_axis

Taking the example you have there if you run: np.put_along_axis(z_array, i_array, 0, axis=1)

z_array = [[ 0 11  0 13 14 15]
 [10  0 12  0 14 15]
 [10  0 12 13  0 15]
 [10 11  0  0 14 15]]

This is the opposite to what you want. So then what I did (not sure how efficient it is), but create a copy of z_array as a_array. Compare these matrices and keep the values where the elements of z_array are non-zero.

a_array = copy.copy(z_array)
np.put_along_axis(z_array, i_array, 0, axis=1)
a_array[(z_array != 0)] = 0

This gives the output you expected:

a_array = [[10  0 12  0  0  0]
 [ 0 11  0 13  0  0]
 [ 0 11  0  0 14  0]
 [ 0  0 12 13  0  0]]

np.put_along_axis documentation

See this answer for more options for combining the tables, (np.where)