Home > Enterprise >  Random sample from specific rows and columns of a 2d numpy array (essentially sampling by ignoring e
Random sample from specific rows and columns of a 2d numpy array (essentially sampling by ignoring e

Time:12-31

I have a 2d numpy array size 100 x 100. I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.

However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.

gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)

indices_random = ????? Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.

What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.

Desired output:

An 1d list of indices from a flattened new_abundances array.

CodePudding user response:

Woudl something like solve your problem?

import numpy as np

np.random.seed(0)
mat = np.random.random(size=(100,100))

x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)

coordinates = list(zip(x_indices,y_indices))

flat_mat = mat.flatten()
flat_index = x_indices * 100   y_indices

Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38 45

Please also note that multiple sampling of the original data is possible this way.

Using your notation:

import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))

x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))

flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize    y_indices
  • Related