Home > Mobile >  randomly sampling arrays - issue with numpy.delete
randomly sampling arrays - issue with numpy.delete

Time:06-14

I have 2 arrays, x_1g and x_2g. I want to randomly sample 10% of each array and remove that 10% and insert it into the other array. This means that my final and initial arrays should have the same shape, but 10% of the data is randomly sampled from the other array. I have been trying this with the code below but my arrays keep increasing in length, meaning I haven't properly deleted the sampled 10% data from each array.

n = len(x_1g)
n2 = round(n/10)

ints1 = np.random.choice(n, n2)

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

My arrays x_1g and x_2g have shapes (150298, 10)

x_1g.shape
>> (1502983, 10)

x_1_replace.shape 
>> (150298, 10)

so when I remove the 10% data (x_1_replace) from my original array (x_1g) I should get the array shape:

1502983-150298 = 1352685

However when I check the shape of my array x_1 I get:

x_1.shape
>> (1359941, 10)

I'm not sure what is going on here so if anyone has any suggestions please let me know!!

CodePudding user response:

What happens, is that by using ints1 = np.random.choice(n, n2) to generate your indices, you are choosing n2 times a number between 0 and n-1. You have no guarantee that you will generate n2 different numbers. You are most likely generating a certain number of duplicates. And if you pass several times the same index position to np.delete it will be deleted just once. You can check this by reading the number of unique values in ints1:

np.unique(ints1).shape

You'll see it is not matching n2 (in your example, you'll get (143042,)).

There's probably more than one way to ensure that you'll get n2 different indices, here is one example:

n = len(x_1g)
n2 = round(n/10)

ints1 = np.arange(n)  # generating an array [0 ... n-1]
np.random.shuffle(ints1)  # shuffle it
ints1 = ints1[:n2]  # take the first n2 values

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

Now you can check:

x_1.shape
# (1352685, 10)
  • Related