Adding randomization to numpy function array

Let's propose that we have an array arr and we want to divide the array into pieces saving the order of elements. It can be easily done using np.array_split:

import numpy
arr = np.array([0,1,2,3,4,5,6,7,8])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

If arr.size % pieces != 0 the output of np.array_split will be uneven:

arr = np.array([0,1,2,3,4,5,6,7])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am wondering what is the best way to add randomization to the procedure to get the following outputs with equal probability:

>>> [array([0, 1]), array([2, 3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am interested in generalized solution which will also work for other combinations of array size and number of pieces, for example:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
pieces = 6

CodePudding user response：

def random_arr_split(arr, n):
    # NumPy doc: For an array of length l that should be split into n sections,
    # it returns l % n sub-arrays of size l//n   1 and the rest of size l//n
    piece_lens = [arr.size // n   1] * (arr.size % n)   [arr.size // n] * (n - arr.size % n)
    piece_lens_shuffled = np.random.permutation(piece_lens)
    
    # drop the last element, which is the end of the array
    # otherwise getting an empty array at the end
    split_indices = np.cumsum(piece_lens_shuffled)[:-1]
    return np.array_split(arr, split_indices)