I have a dataset X where each data point (each row) is in a particular order. To totally shuffle the X, I use something like this:
shufX = torch.randperm(len(X))
X=X[shufX]
Say I just want to mildly shuffle (maybe shift positions of a few data points) without totally shuffling. I would like to have a parameter p, such that when p=0, it does not shuffle , and when p=1, it totally shuffles like the code about. This way, I can adjust the amount of shuffling to be mild or more extensive.
I attempted this but realized it could result in duplicate data points, which is not what I want.
p = 0.1
mask = torch.bernoulli(p*torch.ones(len(X))).bool()
shufX = torch.randperm(len(X))
X1=X[shufX]
C = torch.where(mask1, X, X1)
CodePudding user response:
Create a shuffle function which only swaps a limited number of items.
import numpy as np
from random import randrange, seed
def shuffle( arr_in, weight = 1.0 ):
count = len( arr_in )
n = int( count * weight ) # Set the number of iterations
for ix in range( n ):
ix0 = randrange( count )
ix1 = randrange( count )
arr_in[ ix0 ], arr_in[ ix1 ] = arr_in[ ix1 ], arr_in[ ix0 ]
# Swap the items from the two chosen indices
seed ( 1234 )
arr = np.arange(50)
shuffle( arr, 0.25 )
print( arr )
# [ 7 15 42 3 4 44 28 0 8 29 10 11 12 13 14 22 16 17 18 19 20 21
# 1 23 24 25 26 27 49 9 41 31 32 33 34 35 36 5 38 30 40 39 2 43
# 37 45 46 47 48 6]
Even with a weight of 1.0 some of the items ( on average ) won't be moved. You can play with the parameters to the function to get the behaviour you need.