How to change random index positions in numpy array-CodePudding

I have a numpy array that looks like this:

array([[0.5, 0.2, 0.6],
       [0.8, 0.1, 0.3],
       [0.4, 0.5, 0.4],
       [0.3, 0.2, 0.9]])

I want to change 50% of the values in the 3rd column into a random value. How can I do this efficiently? This operation will have to be performed hundreds of thousands of times, so efficiency is very important here. The output can look like this:

array([[0.5, 0.2, 0.2],
       [0.8, 0.1, 0.3],
       [0.4, 0.5, 0.4],
       [0.3, 0.2, 0.1]])

I first thought about isolating this column, then replacing some of the values, and then moving this column back into the original matrix.

last_column = array[:,2]
last_column = change_values_randomly(last_column)
np.c_[array[:,:2], last_column]

How do I change 50% of these values randomly?

CodePudding user response：

Try this

arr = np.array([[0.5, 0.2, 0.6],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.9]])
# randomly select rows 
rows = np.random.choice(4, size=2, replace=False)
# replace values in the last column with random values
arr[rows, 2] = np.random.rand(2)
arr
array([[0.5       , 0.2       , 0.81496687],
       [0.8       , 0.1       , 0.3       ],
       [0.4       , 0.5       , 0.18514918],
       [0.3       , 0.2       , 0.9       ]])

Using the generator api is much faster than np.random.choice

rows = np.random.default_rng().choice(length, size=half, replace=False)
arr[rows, 2] = np.random.rand(2)

Benchmark:

arr = np.linspace(0,1,300000).reshape(-1,3)
length = len(arr)
half = length//2

%timeit np.random.default_rng().choice(length, size=half, replace=False)
# 2.31 ms ± 808 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.random.permutation(length)[:half]
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.random.choice(length, size=half, replace=False)
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.random.default_rng().permutation(length)[:half]
# 3.69 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CodePudding user response：

Use numpy.random.permutation to generate random indices without repetition, take the first half, and then assign the value with a random array of the same size:

>>> r = np.array([[0.5, 0.2, 0.6],
... [0.8, 0.1, 0.3],
... [0.4, 0.5, 0.4],
... [0.3, 0.2, 0.9]])
>>> last = r[:, -1]
>>> last[np.random.permutation(last.size)[:last.size // 2]] = np.random.rand(last.size // 2)
>>> r
array([[0.5       , 0.2       , 0.6       ],
       [0.8       , 0.1       , 0.56898452],
       [0.4       , 0.5       , 0.4       ],
       [0.3       , 0.2       , 0.67314702]])

CodePudding user response：

You can use:

n = a.shape[0]
idx = np.random.choice(np.arange(n), size=n//2, replace=False)
a[idx, 2] = -1 # or for random:  np.random.rand(n)

example output:

[[ 0.5  0.2 -1. ]
 [ 0.8  0.1  0.3]
 [ 0.4  0.5 -1. ]
 [ 0.3  0.2  0.9]]