I have a numpy array that looks like this:
array([[0.5, 0.2, 0.6],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.9]])
I want to change 50% of the values in the 3rd column into a random value. How can I do this efficiently? This operation will have to be performed hundreds of thousands of times, so efficiency is very important here. The output can look like this:
array([[0.5, 0.2, 0.2],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.1]])
I first thought about isolating this column, then replacing some of the values, and then moving this column back into the original matrix.
last_column = array[:,2]
last_column = change_values_randomly(last_column)
np.c_[array[:,:2], last_column]
How do I change 50% of these values randomly?
CodePudding user response:
Try this
arr = np.array([[0.5, 0.2, 0.6],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.9]])
# randomly select rows
rows = np.random.choice(4, size=2, replace=False)
# replace values in the last column with random values
arr[rows, 2] = np.random.rand(2)
arr
array([[0.5 , 0.2 , 0.81496687],
[0.8 , 0.1 , 0.3 ],
[0.4 , 0.5 , 0.18514918],
[0.3 , 0.2 , 0.9 ]])
Using the generator api is much faster than np.random.choice
rows = np.random.default_rng().choice(length, size=half, replace=False)
arr[rows, 2] = np.random.rand(2)
Benchmark:
arr = np.linspace(0,1,300000).reshape(-1,3)
length = len(arr)
half = length//2
%timeit np.random.default_rng().choice(length, size=half, replace=False)
# 2.31 ms ± 808 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.random.permutation(length)[:half]
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.random.choice(length, size=half, replace=False)
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.random.default_rng().permutation(length)[:half]
# 3.69 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
CodePudding user response:
Use numpy.random.permutation
to generate random indices without repetition, take the first half, and then assign the value with a random array of the same size:
>>> r = np.array([[0.5, 0.2, 0.6],
... [0.8, 0.1, 0.3],
... [0.4, 0.5, 0.4],
... [0.3, 0.2, 0.9]])
>>> last = r[:, -1]
>>> last[np.random.permutation(last.size)[:last.size // 2]] = np.random.rand(last.size // 2)
>>> r
array([[0.5 , 0.2 , 0.6 ],
[0.8 , 0.1 , 0.56898452],
[0.4 , 0.5 , 0.4 ],
[0.3 , 0.2 , 0.67314702]])
CodePudding user response:
You can use:
n = a.shape[0]
idx = np.random.choice(np.arange(n), size=n//2, replace=False)
a[idx, 2] = -1 # or for random: np.random.rand(n)
example output:
[[ 0.5 0.2 -1. ]
[ 0.8 0.1 0.3]
[ 0.4 0.5 -1. ]
[ 0.3 0.2 0.9]]