I want to generate a random binary matrix, so I'm using W=np.random.binomial(1, p, (n,n))
.
It works fine, but I want a constraint that no row is just of 0s.
I create the following function:
def random_matrix(p,n):
m=0
while m==0:
W = np.random.binomial(1, p, (n,n))
m=min(W.sum(axis=1))
return W
It also works fine, but it seems to me too inefficient. Is there a faster way to create this constraint?
CodePudding user response:
One way to make the process of generating a random binary matrix with no rows of only 0s more efficient is to use the np.random.choice function to randomly choose a non-zero entry from each row of the matrix and set its value to 1. This avoids the need to use a while loop and repeatedly check for rows of only 0s, which can be computationally expensive for large matrices.
Here is an example of how you could use the np.random.choice function to generate a random binary matrix with no rows of only 0s:
W = np.random.binomial(1, p, (n,n))
for row in W:
nonzero_indices = np.where(row != 0)[0]
if nonzero_indices.size == 0:
random_index = np.random.randint(0, n)
row[random_index] = 1
else:
random_index = np.random.choice(nonzero_indices)
row[random_index] = 1
CodePudding user response:
When the matrix is large, regenerating the entire matrix just because few rows are full of zeros is not efficient. It should be statistically safe to only regenerate the target rows. Here is an example:
def random_matrix(p,n):
W = np.random.binomial(1, p, (n,n))
while True:
null_rows = np.where(W.sum(axis=1) == 0)[0]
# If there is no null row, then m>0 so we stop the replacement
if null_rows.size == 0:
break
# Replace only the null rows
W[null_rows] = np.random.binomial(1, p, (null_rows.shape[0],n))
return W
Even faster solutions
There is an even more efficient approach when p
is close to 0 (when p
is close to 1, then the above function is already fast). Indeed, a binomial random variable with 0-1 values is a Bernoulli random variable. The sum of Bernoulli random values with a probability p repeated many times is a binomial random value! Thus, you can generate the sum for all row using S = np.random.binomial(n, p, (n,n))
, then apply the above method to remove null values and then build the final matrix by generating S[i]
one values for the ith row and use np.shuffle
so to randomize the order of the 0-1 values in each row. This method solve conflicts much more efficiently than all others. Indeed, it does not need to generate the full row to check if it is full of zeros. It is n
times faster to solve conflicts!
If this is not enough, you can use the uint8
datatype to generate W
. Indeed, the memory is slow so generating smaller matrices is generally faster, not to mention it takes less RAM.
If this is not enough, you can generate S
item per item using Numba JIT compiler and a basic loop. This should be faster since there is no temporary array to create except the final one. For large matrices, this algorithm can even be parallelized (every row can be independently generated). This last solution should be close to be optimal.