I have a 2d numpy array size 100 x 100. I want to randomly sample values from the "inside" 80 x 80 values so that I can exclude values which are influenced by edge effects. I want to sample from row 10 to row 90 and within that from column 10 to column 90.
However, importantly, I need to retain the original index values from the 100 x 100 grid, so I can't just trim the dataset and move on. If I do that, I am not really solving the edge effect problem because this is occurring within a loop with multiple iterations.
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
row_idx =np.arange(min_select,max_select)
col_idx = np.arange(min_select,max_select)
indices_random = ?????
Somehow randomly sample from new_abundances only within the rows and columns of row_idx and col_idx set.
What I ultimately need is a list of 250 random indices selected from within the flattened new_abundances array. I need to keep the new_abundances array as 2d to identify the "edges" but once that is done, I need to flatten it to get the indices which are randomly selected.
Desired output:
An 1d list of indices from a flattened new_abundances array.
CodePudding user response:
Woudl something like solve your problem?
import numpy as np
np.random.seed(0)
mat = np.random.random(size=(100,100))
x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)
coordinates = list(zip(x_indices,y_indices))
flat_mat = mat.flatten()
flat_index = x_indices * 100 y_indices
Then you can access elements using any value from the coordinates
list, e.g. mat[coordinates[0]]
returns the the matrix value at coordinates[0]
. Value of coordinates[0]
is (38, 45)
in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]]
holds, where flat_index[0]==3845=100*38 45
Please also note that multiple sampling of the original data is possible this way.
Using your notation:
import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))
x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))
flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize y_indices