Home > Blockchain >  python-numpy, assigning the nearest valid value of a reference array
python-numpy, assigning the nearest valid value of a reference array

Time:02-03

Here are toy NumPy arrays:

nrow = 10
ar_label = np.arange(nrow**2).reshape(nrow, nrow)
ar_label[1:4, 1:4] = 100
ar_label[6:9, 2:5] = 200
ar_label[2:5, 6:9] = 300
ar_label = np.where(ar_label<100, np.nan, ar_label)

ar_label

array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan, 100., 100., 100.,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan, 100., 100., 100.,  nan,  nan, 300., 300., 300.,  nan],
       [ nan, 100., 100., 100.,  nan,  nan, 300., 300., 300.,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan, 300., 300., 300.,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan, 200., 200., 200.,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan, 200., 200., 200.,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan, 200., 200., 200.,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]])
np.random.seed(11)
ar_rand = np.random.randint(0, nrow*3, size=nrow**2).reshape(nrow, nrow)
ar_rand = np.where(ar_rand==0, ar_rand, np.nan)

ar_rand

array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan,  0., nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan,  0., nan],
       [nan,  0., nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan,  0., nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan,  0., nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan,  0., nan, nan, nan, nan, nan, nan, nan]])

Now, I want to replace zeros in ar_rand with the nearest (i.e., Euclidean distance using the two axes) non-nan value of the corresponding element in ar_label.

For example, the very left zero in ar_rand will be replaced with 100, the very bottom one will be replaced with 200, and so on.

A solution using NumPy or Xarray will be preferred, but ones using other libraries are also welcome.

A desired solution shouldn't depend on the specific distributions of non-nan values of ar_label as the real data I am playing with has a different distribution.

Thank you.

CodePudding user response:

The following method avoid loops at the expense of the required RAM memory.

First of all I defined a matrix containing, for each element of ar_rand, its row and col id:

ids = np.stack(
    np.meshgrid(np.arange(ar_label.shape[0]), np.arange(ar_label.shape[1]))
).T #shape (10, 10, 2)

After that I computed all possible ids euclidean distances (basically between all possible pairs in the matrix):

euclidean_ids_distances = np.sqrt(((ids.reshape(-1, 2)[None,:]-ids.reshape(-1, 2)[:,None,:])**2).sum(-1)).reshape(*ar_label.shape,*ar_label.shape)
#shape (10, 10, 10, 10)

The above matrix is quite large and would cause memory problems for bigger nrow. Maybe it is a bit confusing, but it's simpler then it seems. In practice if we want to see the euclidean distances between the element [0,0] and all the other ones, we can find them in euclidean_ids_distances[0,0]:

plt.imshow(euclidean_ids_distances[0,0], cmap="Greys")

distance matrix 0,0

Same thing for the element [6,2] (for example):

distance matrix 6,2

In this way, for each non-null element from ar_rand, I could find the argmin distance in euclidean_ids_distances matrix considering only non-null ar_label ids:

label_ids = ids[~np.isnan(ar_label)] [euclidean_ids_distances[~np.isnan(ar_rand)][:,~np.isnan(ar_label)].argmin(-1)]
#shape (6, 2)
#where 6 is the number of non-null ar_rand elements, 2 is the couple of coordinates (row and col)

Finally I created a copy of ar_rand and replaced the non-null values with the values in the ar_label specified in the label_ids

ar_rand_copy = ar_rand.copy()
ar_rand_copy[~np.isnan(ar_rand_copy)] = ar_label[label_ids[:,0], label_ids[:,1]]

# array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan, 300.,  nan,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan, 300.,  nan],
#        [ nan, 100.,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan, 300.,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan, 200.,  nan,  nan,  nan],
#        [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
#        [ nan,  nan, 200.,  nan,  nan,  nan,  nan,  nan,  nan,  nan]])

CodePudding user response:

Here is my solution. The logic is:

  1. create a NumPy array that contains the distance between a zero element of ar_rand and all other non-nan elements of ar_label
  2. get the coordinates of the element of ar_label which has the shortest distance
  3. replace the element of ar_land with the element of ar_label at the coordinates from 2
  4. iterate 1-3 over all zero elements

, which is written as:

i_rand, j_rand = np.where(ar_rand == 0.0)
i_label, j_label = np.where(ar_label==ar_label)

ar_rand_rep = np.zeros_like(ar_rand) * np.nan
for n in range(len(i_rand)):  # apply over grids to be replaced
    ar_dist = np.array([np.sqrt((i_rand[n] - i_label[i])**2   (j_rand[n] - j_label[i])**2) for i in range(len(i_label))])
    argsort = ar_dist.argsort()
    ar_dist = np.take_along_axis(ar_dist, argsort, axis=0)
    i_label = np.take_along_axis(i_label, argsort, axis=0)
    j_label = np.take_along_axis(j_label, argsort, axis=0)
    ar_rand_rep[i_rand[n], j_rand[n]] = ar_label[i_label[0], j_label[0]]

ar_rand_rep

array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan, 300.,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan, 300.,  nan],
       [ nan, 100.,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan, 300.,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan, 200.,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan, 200.,  nan,  nan,  nan,  nan,  nan,  nan,  nan]])

Other solutions are of course welcome, especially ones without a loop.

CodePudding user response:

You can try this:

import numpy as np

nrow = 10
ar_label = np.arange(nrow**2).reshape(nrow, nrow)
ar_label[1:4, 1:4] = 100
ar_label[6:9, 2:5] = 200
ar_label[2:5, 6:9] = 300
ar_label = np.where(ar_label<100, np.nan, ar_label)

np.random.seed(11)
ar_rand = np.random.randint(0, nrow*3, size=nrow**2).reshape(nrow, nrow)
ar_rand = np.where(ar_rand==0, ar_rand, np.nan)

def distance2d(xp, yp, x, y):
    return np.hypot(x - xp, y - yp)
    
def unstack(a, axis):
    return np.moveaxis(a, axis, 0)

yy,xx = np.mgrid[0:10, 0:10]

z = np.dstack((yy,xx)) # coordinates 
    
blanks = z[~np.isnan(ar_rand)]
#print(blanks)

values = z[~np.isnan(ar_label)]

xv, yv = unstack(values, -1)

# distance between blanks and values only
# saves memory
for p in blanks:
    xb,yb = p
    id = distance2d(xb, yb, xv, yv).argmin()
    xc,yc = values[id]
    ar_rand[xb,yb] = ar_label[xc,yc]

print(ar_rand)
    

The code can be vectorized using numba for large arrays.

CodePudding user response:

Another way you could go that might be faster and doesn't reinvent the wheel as much is to use scipy.interpolate.NearestNDInterpolator.

ar_label_finite = np.isfinite(ar_label)

interpolant = scipy.interpolate.NearestNDInterpolator(
    x=np.argwhere(ar_label_finite),
    y=ar_label[ar_label_finite],
)

mask = ar_rand == 0
ar_rand_replaced = ar_rand.copy()
ar_rand_replaced[mask] = interpolant(*np.nonzero(mask))

print(ar_rand_replaced)

which outputs

[[ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan 300.  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan  nan  nan 300.  nan]
 [ nan 100.  nan  nan  nan  nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan 300.  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan 200.  nan  nan  nan]
 [ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [ nan  nan 200.  nan  nan  nan  nan  nan  nan  nan]]
  • Related