Replacing values in n-dimensional tensor given indices from np.argwhere()-CodePudding

I'm somewhat new to numpy so this might be a dumb question, but here goes:

Let's say I have a tensor of any shape and size, say (100,5,5) or (3,3,10,15,4). I have a randomly generated list of indices for points I want to replace with np.nan. For a (3,3,3) test case, it would be as follows:

>> data = np.random.randn(3,3,3)
>> data
array([[[ 0.21368315, -1.42814113,  1.23021783],
        [ 0.25835315,  0.44775156, -1.20489094],
        [ 0.25928972,  0.39486046, -1.79189447]],

       [[ 2.24080908, -0.89617961, -0.29550817],
        [ 0.21756087,  1.33996913, -1.24418745],
        [-0.63617598,  0.56848439,  0.8175564 ]],

       [[ 0.61367002, -1.16104071, -0.53488283],
        [ 1.0363354 , -0.76888041,  1.24524786],
        [-0.84329375, -0.61744489,  1.50502058]]])

>> idxs = np.argwhere(np.isfinite(data))
>> dropidxs = idxs[np.random.choice(idxs.shape[0], 3, replace=False)]
>> dropidxs
array([[1, 1, 1],
       [2, 0, 2],
       [2, 1, 0]])

How do I replace the corresponding values? Previously, when I was only dealing with the 3D case, I did it using the following.

for idx in dropidxs:
    i,j,k = dropidxs[idx]
    missingCube[i,j,k] = np.nan

But now, I want the function to be able to handle tensors of any size. I've tried

for idx in dropidxs:
    missingCube[idx] = np.nan

and

missingCube[dropidxs] = np.nan

But both (unsurprisingly) end up removing a corresponding slice along axis=0. How should I approach this? Is there an easier way to achieve what I'm trying to do?

CodePudding user response：

Is it what you're searching for:

import numpy as np
x = np.random.randn(10, 3, 3, 3)

new_value = 0
x[x < 0] = new_value

x[x == -inf] = 0

CodePudding user response：

In [486]: data = np.random.randn(3,3,3)

With this creation all terms are finite, so nonzero returns a tuple of (27,) arrays:

In [487]: idx = np.nonzero(np.isfinite(data))
In [488]: len(idx)
Out[488]: 3
In [489]: idx[0].shape
Out[489]: (27,)

argwhere produces the same numbers, but in a 2d array:

In [490]: idxs = np.argwhere(np.isfinite(data))
In [491]: idxs.shape
Out[491]: (27, 3)

So you select a subset.

In [492]: dropidxs = idxs[np.random.choice(idxs.shape[0], 3, replace=False)]
In [493]: dropidxs.shape
Out[493]: (3, 3)
In [494]: dropidxs
Out[494]: 
array([[1, 1, 0],
       [2, 1, 2],
       [2, 1, 1]])

We could have generated the same subset by x = np.random.choice(...), and applying that x to the arrays in idxs. But in this case, the argwhere array is easier to work with.

But to apply that array to indexing we still need a tuple of arrays:

In [495]: tup = tuple([dropidxs[:,i] for i in range(3)])
In [496]: tup
Out[496]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [497]: data[tup]
Out[497]: array([-0.27965058,  1.2981397 ,  0.4501406 ])
In [498]: data[tup]=np.nan
In [499]: data
Out[499]: 
array([[[-0.4899279 ,  0.83352547, -1.03798762],
        [-0.91445783,  0.05777183,  0.19494065],
        [ 0.6835925 , -0.47846423,  0.13513958]],

       [[-0.08790631,  0.30224828, -0.39864576],
        [        nan, -0.77424244,  1.4788093 ],
        [ 0.41915952, -0.09335664, -0.47359613]],

       [[-0.40281937,  1.64866377, -0.40354504],
        [ 0.74884493,         nan,         nan],
        [ 0.13097487, -1.63995208, -0.98857852]]])

Or we could index with:

In [500]: data[dropidxs[:,0],dropidxs[:,1],dropidxs[:,2]]
Out[500]: array([nan, nan, nan])

Actually, a transpose of dropidxs might be be more convenient:

In [501]: tdrop = dropidxs.T
In [502]: tuple(tdrop)
Out[502]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [503]: data[tuple(tdrop)]
Out[503]: array([nan, nan, nan])

Sometimes we can use * to expand a list/array into a tuple, but not when indexing:

In [504]: data[*tdrop]
  File "<ipython-input-504-cb619d907adb>", line 1
    data[*tdrop]
         ^
SyntaxError: invalid syntax

but we can create the tuple with:

In [506]: data[(*tdrop,)]
Out[506]: array([nan, nan, nan])

CodePudding user response：

You can choose from flattened indices and convert back to data indices to set elements to np.nan. Here with a seed(41) to make results reproducible, choosing 3 elements.

import numpy as np

data = np.random.randn(3,3,3)

rng = np.random.default_rng(41)
idx = rng.choice(np.arange(data.size), 3, replace=False)
data[np.unravel_index(idx, data.shape)] = np.nan
data

Output

array([[[ 0.13180452, -0.81228319, -0.04456739],
        [ 0.53060077, -0.2246579 ,  1.83926463],
        [-0.38670047, -0.53703577,  0.49275628]],

       [[ 0.36671354,  1.44012848, -0.57209412],
        [ 0.53960111, -1.06578638,  1.10669842],
        [ 1.1772824 ,         nan, -0.82792041]],

       [[-0.03352594,  0.29351109,  0.57021538],
        [-0.33291872,         nan,  0.04675677],
        [        nan,  2.59450517, -1.9579655 ]]])