Home > Net >  How to randomly replace n number of columns and m number of rows with zero value from a 2d numpy arr
How to randomly replace n number of columns and m number of rows with zero value from a 2d numpy arr

Time:10-03

I want to replace n randomly selected column value to zeros in m randomly selected rows for the purpose of adding noise to the dataset. So which means if my n = 3 and m = 5, it will replace zero to 3 randomly selected columns and 5 randomly selected rows.

For example if my n = 3(columns), m = 5(rows)

array([[10, 6, 1, 4, 8, 11, 12],
       [3, 2, 6, 7, 6, 2, 3],
       [1, 3, 2, 1, 10, 4, 9],
       [8, 1, 2, 4, 11, 12, 13],
       [3, 9, 5, 3, 4, 14, 4]])

one of the possible output will be

array([[10, 6, **0**, **0**, **0**, 11, 12],
       [**0**, 2, **0**, 7, **0**, 2, 3],
       [1, 3, 2, **0**, 10, **0**, **0**],
       [8, 1, 2, 4, **0**, **0**, **0**],
       [3, 9, **0**, 3, **0**, 14, 0]])

And if my n = 1(columns), m = 2(rows)

array([[10, 6, 1, 4, 8, 11, 12],
       [3, 2, 6, 7, 6, 2, 3],
       [1, 3, 2, 1, 10, 4, 9],
       [8, 1, 2, 4, 11, 12, 13],
       [3, 9, 5, 3, 4, 14, 4]])

one of the possible output will be

array([[10, **0**, 1, 4, 8, 11, 12],
       [3, 2, 6, 7, 6, 2, 3],
       [1, 3, 2, 1, **0**, 4, 9],
       [8, 1, 2, 4, 11, 12, 13],
       [3, 9, 5, 3, 4, 14, 4]])

Thanks in advance if anyone can help

CodePudding user response:

Try this:

import numpy as np
np.random.seed(123)
n = 3 #(columns)
m = 5 #(rows)

arr = np.array([[10, 6, 1, 4, 8, 11, 12],
                [3, 2, 6, 7, 6, 2, 3],
                [1, 3, 2, 1, 10, 4, 9],
                [8, 1, 2, 4, 11, 12, 13],
                [3, 9, 5, 3, 4, 14, 4]])

msk = np.array([np.random.choice(arr.shape[1], size=(n), replace=False) 
                for _ in range(m)])

selected_rows = np.arange(m)
arr[np.arange(m)[:, None], msk] = 0
print(arr)

Output:

[[10  0  1  0  0 11 12]
 [ 0  2  6  7  0  2  0]
 [ 1  3  2  0  0  0  9]
 [ 8  1  0  4  0  0 13]
 [ 3  0  0  0  4 14  4]]

Or if you want to select randomly selected rows & columns you can try like below.

np.random.seed(123)
arr = np.array([[10, 6, 1, 4, 8, 11, 12],
                [3, 2, 6, 7, 6, 2, 3],
                [1, 3, 2, 1, 10, 4, 9],
                [8, 1, 2, 4, 11, 12, 13],
                [3, 9, 5, 3, 4, 14, 4]])

msk = np.array([np.random.choice(arr.shape[1], size=(n), replace=False) 
                for _ in range(m)])

rnd = np.random.choice(2, size=5)
selected_rows = np.flatnonzero(rnd == max(rnd))
msk = msk[selected_rows]

arr[selected_rows[:, None], msk] = 0

print(arr)

Output:

[[10  0  1  0  0 11 12]
 [ 3  2  6  7  6  2  3]
 [ 1  3  2  1 10  4  9]
 [ 8  1  0  4  0  0 13]
 [ 3  9  5  3  4 14  4]]
  • Related