Change numpy 2d array at the positions descibed in list of lists-CodePudding

I have numpy M*N array.

numpy.random.seed(23)
A = numpy.round(numpy.random.random((4, 10)), 2)

array([[0.59, 0.77, 0.66, 0.56, 0.18, 0.24, 0.51, 0.4 , 0.48, 0.96],
       [0.9 , 0.51, 0.82, 0.83, 0.23, 0.08, 0.47, 0.88, 0.15, 0.23],
       [0.92, 0.13, 0.92, 0.23, 0.62, 0.95, 0.26, 0.45, 0.97, 0.24],
       [0.2 , 0.69, 0.85, 0.45, 0.1 , 0.62, 0.08, 0.05, 0.35, 0.91]])

Then I have a list of M lists of indexes:

ind = [[1,2], 
       [4,7,8,9], 
       [3,6,7], 
       [4,5,1]]

Each list contains indexes of the corresponding row to be nulled, i.e. at row #0: 1th and 2nd elements should be nulled, etc...

And I should obtain:

array([[0.59, 0.  , 0.  , 0.56, 0.18, 0.24, 0.51, 0.4 , 0.48, 0.96],
       [0.9 , 0.51, 0.82, 0.83, 0.  , 0.08, 0.47, 0.  , 0.  , 0.  ],
       [0.92, 0.13, 0.92, 0.  , 0.62, 0.95, 0.  , 0.  , 0.97, 0.24],
       [0.2 , 0.  , 0.85, 0.45, 0.  , 0.  , 0.08, 0.05, 0.35, 0.91]])

The forehead solution is:

for i in range(len(ind)):
    A[i, ind[i]] = 0

But, you know, it's too slow. Is it possible to have "vectorized" solution?

CodePudding user response：

To answer the question in the post,

Is it possible to have "vectorized" solution?

No. It is not possible to take advantage of vectorization because your ind list is ragged. However, depending on how you acquire these index values, changing that process slightly makes the task trivial:

rows = [0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3]
cols = [1, 2, 4, 7, 8, 9, 3, 6, 7, 4, 5, 1]

A[rows, cols] = 0

If you can produce your indices this way in the first place, then you're golden.

Again, as others have mentioned, this information (the rows and columns) is already present in your indices. Depending on how they were generated (which I suggest you share as an edit to the body of the question) in the first place (either calculated or read from a file, whatever) you should theoretically be able to generate your indices in the appropriate format in the first place, thereby negating the necessity for your "forehead" workaround or having to transform the indices into a numpy-compatible format after the fact altogether.

So, TL;DR -- if you can generate your indices in the format I've shown above in the first place, do so. If not, then there's no point in using anything other than the "forehead" solution.

CodePudding user response：

Your iterative solution:

In [78]: %%timeit 
    ...: for i in range(len(ind)):
    ...:     A[i, ind[i]] = 0
34.4 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The alternative that I suggested in a comment:

In [79]: d=np.arange(4).repeat([len(i) for i in ind])    
In [80]: d
Out[80]: array([0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3])    
In [81]: A[d,np.hstack(ind)]
Out[81]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
    
In [83]: %%timeit
    ...: d=np.arange(4).repeat([len(i) for i in ind])
    ...: A[d,np.hstack(ind)] = 0
37.1 µs ± 71.1 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

This alternative is a bit slower. However the relative speeds will vary for real world cases, depending on the shape of A, and the number of indices in a ind.