Home > Net >  How do I control the magnitude at which I shuffle my dataset
How do I control the magnitude at which I shuffle my dataset

Time:01-16

I have a dataset X where each data point (each row) is in a particular order. To totally shuffle the X, I use something like this:

        shufX = torch.randperm(len(X))
        X=X[shufX]

Say I just want to mildly shuffle (maybe shift positions of a few data points) without totally shuffling. I would like to have a parameter p, such that when p=0, it does not shuffle , and when p=1, it totally shuffles like the code about. This way, I can adjust the amount of shuffling to be mild or more extensive.

I attempted this but realized it could result in duplicate data points, which is not what I want.

    p = 0.1 
    mask = torch.bernoulli(p*torch.ones(len(X))).bool()
    shufX = torch.randperm(len(X))
    X1=X[shufX]
    C = torch.where(mask1, X, X1)

CodePudding user response:

Create a shuffle function which only swaps a limited number of items.

import numpy as np
from random import randrange, seed

def shuffle( arr_in, weight = 1.0 ):
    count = len( arr_in )
    n = int( count * weight ) # Set the number of iterations
    for ix in range( n ):
        ix0 = randrange( count )
        ix1 = randrange( count )
        arr_in[ ix0 ], arr_in[ ix1 ] = arr_in[ ix1 ], arr_in[ ix0 ]
        # Swap the items from the two chosen indices

seed ( 1234 )
arr = np.arange(50)
shuffle( arr, 0.25 )
print( arr )

# [ 7 15 42  3  4 44 28  0  8 29 10 11 12 13 14 22 16 17 18 19 20 21
#   1 23 24 25 26 27 49  9 41 31 32 33 34 35 36  5 38 30 40 39  2 43
#  37 45 46 47 48  6]

Even with a weight of 1.0 some of the items ( on average ) won't be moved. You can play with the parameters to the function to get the behaviour you need.

  • Related