How to shuffle groups of values but not within groups in numpy?-CodePudding

I need to shuffle by rows knowing that the first value in each row is a day number. Rows of the same day number should be kept together. Groups may contain 1, 2, 3 or 4 rows. Each row has the same number of values. Hope the examples below will tell you more.

I have this:

a = np.array([
    [0, 0.02, 0.03, 0.04],
    [0, 0.02, 0.03, 0.04],
    [0, 0.02, 0.03, 0.04],

    [1, 0.12, 0.13, 0.14],

    [2, 0.22, 0.23, 0.24],
    [2, 0.22, 0.23, 0.24],

    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34]
])

I need to have this:

a = np.array([
    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34],
    [3, 0.32, 0.33, 0.34],
        
    [0, 0.02, 0.03, 0.04],
    [0, 0.02, 0.03, 0.04],
    [0, 0.02, 0.03, 0.04],

    [2, 0.22, 0.23, 0.24],
    [2, 0.22, 0.23, 0.24],
    
    [1, 0.12, 0.13, 0.14]    
])

CodePudding user response：

Assuming groups are not split in the input array, you can apply the following strategy:

# Find the number of groups and the number of item in each group
unique, srcCounts = np.unique(a[:,0], return_counts=True)
shuffledGroupPos = np.random.permutation(np.arange(len(unique)))

# Compute the source start/end group indices
srcEnd = np.cumsum(srcCounts)
srcStart = srcEnd - srcCounts

# Find the destination start/end group indices
dstCounts = srcCounts[shuffledGroupPos]
dstEnd = np.cumsum(dstCounts)
dstStart = dstEnd - dstCounts

# Remap the source start/end group indices regarding the destination indices
srcStart = srcStart[shuffledGroupPos]
srcEnd = srcEnd[shuffledGroupPos]

# Output array
result = np.empty_like(a)

# Loop iterating over the groups.
# While this loop can be avoided, the code far much simpler with it.
for i in range(unique.size):
    result[dstStart[i]:dstEnd[i]] = a[srcStart[i]:srcEnd[i]]

CodePudding user response：

This approach using python's random.sample to permute a list of arrays is not fast, but easier to follow. This only works if groups are sorted in blocks beforehand.

import random
random.seed(25)     # used for reproducibility only

groups = a[:,0].astype('int')
idx = (groups[1:] ^ groups[:-1]).nonzero()[0]   1
np.vstack(random.sample(np.split(a, idx), len(idx) 1))

Output

array([[3.  , 0.32, 0.33, 0.34],
       [3.  , 0.32, 0.33, 0.34],
       [3.  , 0.32, 0.33, 0.34],
       [3.  , 0.32, 0.33, 0.34],
       [0.  , 0.02, 0.03, 0.04],
       [0.  , 0.02, 0.03, 0.04],
       [0.  , 0.02, 0.03, 0.04],
       [2.  , 0.22, 0.23, 0.24],
       [2.  , 0.22, 0.23, 0.24],
       [1.  , 0.12, 0.13, 0.14]])