I need to shuffle by rows knowing that the first value in each row is a day number. Rows of the same day number should be kept together. Groups may contain 1, 2, 3 or 4 rows. Each row has the same number of values. Hope the examples below will tell you more.
I have this:
a = np.array([
[0, 0.02, 0.03, 0.04],
[0, 0.02, 0.03, 0.04],
[0, 0.02, 0.03, 0.04],
[1, 0.12, 0.13, 0.14],
[2, 0.22, 0.23, 0.24],
[2, 0.22, 0.23, 0.24],
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34]
])
I need to have this:
a = np.array([
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34],
[3, 0.32, 0.33, 0.34],
[0, 0.02, 0.03, 0.04],
[0, 0.02, 0.03, 0.04],
[0, 0.02, 0.03, 0.04],
[2, 0.22, 0.23, 0.24],
[2, 0.22, 0.23, 0.24],
[1, 0.12, 0.13, 0.14]
])
CodePudding user response:
Assuming groups are not split in the input array, you can apply the following strategy:
# Find the number of groups and the number of item in each group
unique, srcCounts = np.unique(a[:,0], return_counts=True)
shuffledGroupPos = np.random.permutation(np.arange(len(unique)))
# Compute the source start/end group indices
srcEnd = np.cumsum(srcCounts)
srcStart = srcEnd - srcCounts
# Find the destination start/end group indices
dstCounts = srcCounts[shuffledGroupPos]
dstEnd = np.cumsum(dstCounts)
dstStart = dstEnd - dstCounts
# Remap the source start/end group indices regarding the destination indices
srcStart = srcStart[shuffledGroupPos]
srcEnd = srcEnd[shuffledGroupPos]
# Output array
result = np.empty_like(a)
# Loop iterating over the groups.
# While this loop can be avoided, the code far much simpler with it.
for i in range(unique.size):
result[dstStart[i]:dstEnd[i]] = a[srcStart[i]:srcEnd[i]]
CodePudding user response:
This approach using python's random.sample
to permute a list
of arrays
is not fast, but easier to follow. This only works if groups are sorted in blocks beforehand.
import random
random.seed(25) # used for reproducibility only
groups = a[:,0].astype('int')
idx = (groups[1:] ^ groups[:-1]).nonzero()[0] 1
np.vstack(random.sample(np.split(a, idx), len(idx) 1))
Output
array([[3. , 0.32, 0.33, 0.34],
[3. , 0.32, 0.33, 0.34],
[3. , 0.32, 0.33, 0.34],
[3. , 0.32, 0.33, 0.34],
[0. , 0.02, 0.03, 0.04],
[0. , 0.02, 0.03, 0.04],
[0. , 0.02, 0.03, 0.04],
[2. , 0.22, 0.23, 0.24],
[2. , 0.22, 0.23, 0.24],
[1. , 0.12, 0.13, 0.14]])