Select N random rows from a 3D numpy array-CodePudding

I have a 3D array that I want to take random 'sets' from the 2nd axis (i.e. axis=1), N times. I can achieve this via a loop, but I will need to do this at least 10000 times, so I need to find a vectorised solution if possible.

I know that the title and explanation aren't very clear, so this example should help clear things up:

import numpy as np

# Create example data
x = np.array([[[1, 2, 3,  4,  5,  6,  7], [2, 3, 4,  5,  6,  7,  8], [3, 4, 5,  6,  7,  8,  9], [4, 5, 6,  7,  8,  9, 10]],
              [[1, 2, 3,  5,  9, 17, 33], [2, 3, 4,  6, 10, 18, 34], [3, 4, 5,  7, 11, 19, 35], [4, 5, 6,  8, 12, 20, 36]],
              [[1, 2, 5, 10, 17, 26, 37], [2, 3, 6, 11, 18, 27, 38], [3, 4, 7, 12, 19, 28, 39], [4, 5, 8, 13, 20, 29, 40]]])
# x.shape = (3, 4, 7) = (#samples, #spectra, size)

# Generate an array of Ints to select one random spectra from each sample, 'N' times
N = 2  # number of requested 'sets'
specs = np.random.randint(0, x.shape[1], (N, x.shape[0]))  # i.e. random ints in range [0, 4)
# specs.shape = (2, 3) = (N, #spectra)

# Extract/Index the chosen spectra from each sample
# First, instantiate a 'temps' list to contain each of the N 'sets'
temps = []
for s in specs:
    # Instantiate another list to contain the current 'set'
    temp = []

    # Loop through the current set of chosen spectra
    for i, si in enumerate(s):
        # Append the chosen spectra, from each sample, to list
        temp.append(x[i, si])

    # Append completed 'set' to the list of all 'sets'
    temps.append(temp)  # temp.shape = (3, 7) = (#samples, size)

# Convert to array
temps = np.array(temps)
# temps.shape = (2, 3, 7) = (N, #samples, size)

EDIT: I will try to explain this example. For each 'set' of data (i.e. for each iteration through the first loop) I want to select one random index from the 2nd axis (i.e. axis=1), for every element in the 1st axis (i.e. axis=0). E.g. if s = [0, 2, 1] then this means the first temp list will contain, respectively: [1, 2, 3, 4, 5, 6, 7], [3, 4, 5, 7, 11, 19, 35], and [2, 3, 6, 11, 18, 27, 38], as these are the 0th, 2nd, and 1st indices of their respective entries along the 1st axis.

ANOTHER EDIT: Based on my example code and previous edit, I will give a numerical example in pseudo-code alongside it.

# Using the same 'x' value as before...

Let specs = [[0, 2, 1], [1, 3, 3]]

# 1st pass through 1st For loop would give:
# (s = [0, 2, 1])
temp = [[1, 2, 3,  4,  5,  6,  7],  # from 1st pass of 2nd For loop
        [3, 4, 5,  7, 11, 19, 35],  # from 2nd pass of 2nd For loop
        [2, 3, 6, 11, 18, 27, 38]]  # from 3rd pass of 2nd For loop

# 2nd pass through 1st For loop would give:
# (s = [1, 3, 3])
temp = [[2, 3, 4,  5,  6,  7,  8],  # from 1st pass of 2nd For loop
        [4, 5, 6,  8, 12, 20, 36],  # from 2nd pass of 2nd For loop
        [4, 5, 8, 13, 20, 29, 40]]  # from 3rd pass of 2nd For loop

# So, the final output of temps would be:
temps = [[[1, 2, 3,  4,  5,  6,  7],
          [3, 4, 5,  7, 11, 19, 35],
          [2, 3, 6, 11, 18, 27, 38]],

         [[2, 3, 4,  5,  6,  7,  8],
          [4, 5, 6,  8, 12, 20, 36],
          [4, 5, 8, 13, 20, 29, 40]]]

After this, I would then combine the '#samples' axis, with random coefficients, [0.0, 1.0), for each sample to create a (2, 7) array. I think the solution to this part would be similar to the previous, so a solution to this particular problem is likely unnecessary but would still be welcome!

So, to summarise, how can I vectorise/make broadcastable the nested loop in my example?

CodePudding user response：

When you have the pattern x[0, idx[0]], x[1, idx[1]], etc, you're probably looking for something like x[np.arange(idx.shape[0]),idx]. Since your index array is multidimensional and you want it to be broadcast, you want

result = x[np.arange(specs.shape[-1]),specs]

You can confirm that gives the right result with np.all(temps == result).

CodePudding user response：

Given the clarifications of your question, you want to select N random rows in a 3D array on axis 1 (second dimension), but independently on axis 0:

Let's call a the array and x,y,z its 3 dimensions.

An easy way is to select N*x random indices so that there is N per x. Then flatten the array on the first 2 dimensions and slice.

Example input (note the x/x.1/x.2 to track the originating dimension):

array([[[ 0. ,  4. ,  8. , 12. , 16. , 20. , 24. ],
        [ 1. ,  5. ,  9. , 13. , 17. , 21. , 25. ],
        [ 2. ,  6. , 10. , 14. , 18. , 22. , 26. ],
        [ 3. ,  7. , 11. , 15. , 19. , 23. , 27. ]],

       [[ 0.1,  4.1,  8.1, 12.1, 16.1, 20.1, 24.1],
        [ 1.1,  5.1,  9.1, 13.1, 17.1, 21.1, 25.1],
        [ 2.1,  6.1, 10.1, 14.1, 18.1, 22.1, 26.1],
        [ 3.1,  7.1, 11.1, 15.1, 19.1, 23.1, 27.1]],

       [[ 0.2,  4.2,  8.2, 12.2, 16.2, 20.2, 24.2],
        [ 1.2,  5.2,  9.2, 13.2, 17.2, 21.2, 25.2],
        [ 2.2,  6.2, 10.2, 14.2, 18.2, 22.2, 26.2],
        [ 3.2,  7.2, 11.2, 15.2, 19.2, 23.2, 27.2]]])

Processing:

N = 2
# sample with repeats
idx = np.random.randint(y, size=N*x)
corr = np.repeat(np.arange(0,(x-1)*y 1, y), N)
idx  = corr
# sample without repeats
idx = np.concatenate([np.random.choice(list(range(y)), replace=False, size=N) (i*y) for i in range(x)])
# slice array
a.reshape(x*y,z)[idx]

possible output:

array([[ 0. ,  4. ,  8. , 12. , 16. , 20. , 24. ],
       [ 3. ,  7. , 11. , 15. , 19. , 23. , 27. ],
       [ 1.1,  5.1,  9.1, 13.1, 17.1, 21.1, 25.1],
       [ 3.1,  7.1, 11.1, 15.1, 19.1, 23.1, 27.1],
       [ 0.2,  4.2,  8.2, 12.2, 16.2, 20.2, 24.2],
       [ 1.2,  5.2,  9.2, 13.2, 17.2, 21.2, 25.2]])

CodePudding user response：

Original answer before the question was clarified, see new answer for a independent sampling

You can get random indices and slice:

N = 2

# get random indices on the first dimension
idx = np.random.choice(np.arange(x.shape[0]), size=N)

# slice
x[idx]

example output (shape: (2, 3, 7)):

array([[[ 1,  2,  5, 10, 17, 26, 37],
        [ 2,  3,  6, 11, 18, 27, 38],
        [ 3,  4,  7, 12, 19, 28, 39],
        [ 4,  5,  8, 13, 20, 29, 40]],

       [[ 1,  2,  3,  4,  5,  6,  7],
        [ 2,  3,  4,  5,  6,  7,  8],
        [ 3,  4,  5,  6,  7,  8,  9],
        [ 4,  5,  6,  7,  8,  9, 10]]])

Example on other dimensions:

# second dimension (axis 1)
idx = np.random.choice(np.arange(x.shape[1]), size=N)
x[:, idx]