Home > Software design >  Adding randomization to numpy function array_split
Adding randomization to numpy function array_split

Time:11-15

Let's propose that we have an array arr and we want to divide the array into pieces saving the order of elements. It can be easily done using np.array_split:

import numpy
arr = np.array([0,1,2,3,4,5,6,7,8])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

If arr.size % pieces != 0 the output of np.array_split will be uneven:

arr = np.array([0,1,2,3,4,5,6,7])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am wondering what is the best way to add randomization to the procedure to get the following outputs with equal probability:

>>> [array([0, 1]), array([2, 3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am interested in generalized solution which will also work for other combinations of array size and number of pieces, for example:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
pieces = 6

CodePudding user response:

def random_arr_split(arr, n):
    # NumPy doc: For an array of length l that should be split into n sections,
    # it returns l % n sub-arrays of size l//n   1 and the rest of size l//n
    piece_lens = [arr.size // n   1] * (arr.size % n)   [arr.size // n] * (n - arr.size % n)
    piece_lens_shuffled = np.random.permutation(piece_lens)
    
    # drop the last element, which is the end of the array
    # otherwise getting an empty array at the end
    split_indices = np.cumsum(piece_lens_shuffled)[:-1]
    return np.array_split(arr, split_indices)
  • Related