Create a matrix of 0s and 1s, such that each row has only one 1 and each column has at least two 1s-CodePudding

I want to create a 100*4 matrix of 0s and 1s, such that each row has only one 1 and each column has at least two 1s, in R.

MyMat <- as.matrix(rsparsematrix(nrow=100, ncol=4, nnz  = 100))

I am thinking of rsparsematrix but yet I am not sure how to apply my required conditions.

edit. My other try would be dummy_cols, but then no matter what. I am stuck with applying the two conditions yet. I guess there must be a more straightforward way of creating such a matrix.

CodePudding user response：

Stochastically, with two rules:

All rows must have exactly one 1; and
All columns must have at least two 1s.

I control the first implicitly by construction; I test against the second.

nr <- 100 ; nc <- 4
set.seed(42)
lim <- 10000
while (lim > 0) {
  lim <- lim - 1
  M <- t(replicate(nr, sample(c(1, rep(0, nc-1)))))
  if (all(colSums(M > 0) >= 2)) break
}
head(M)
#      [,1] [,2] [,3] [,4]
# [1,]    1    0    0    0
# [2,]    0    0    0    1
# [3,]    0    0    0    1
# [4,]    0    1    0    0
# [5,]    0    0    0    1
# [6,]    0    1    0    0

colSums(M)
# [1] 25 30 21 24

lim
# [1] 9999

My use of lim is hardly needed in this example, but is there as a mechanism to stop this from running infinitely: if you change the dimensions and/or the rules, it might become highly unlikely or infeasible to meet all rules, so this keeps the execution time limited. (10000 is completely arbitrary.)

My point in the comment is that it would be rather difficult to find a 100x4 matrix that matches rule 1 that does not match rule 2. In fact, since the odds of a 0 or a 1 in any one cell is 0.75 and 0.25, respectively, to find a column (among 100 rows) that contains fewer than two 1s would be around 1.1e-11.

CodePudding user response：

1) A matrix consisting of 25 4x4 identity matrices stacked one on top of each other satisfies these requirements

m <- matrix(1, 25) %x% diag(4)

2) Exchanging the two arguments of %x% would also work and gives a different matrix which also satisfies this.

3) Any permutation of the rows and the columns of the two solution matrices in (1) and (2) would also satisfy the conditions.

m[sample(100), sample(4)]

4) If the objective is to generate a random table containing 0/1 values whose row sums are each 1 and whose column sums are each 25 then use r2dtable:

r <- r2dtable(1, rep(1, 100), rep(25, 4))[[1]]

5) or if it is desired to allow any column sums of at least 2 then:

rsums <- rep(1, 100)
csums <- rmultinom(1, 92, rep(0.25, 4))   2
r <- r2dtable(1, rsums, csums)[[1]]

CodePudding user response：

Here is a simple way to generate the 100 rows with the 1's randomly positioned and then create the matrix by transposing the rows object. The matrix generation is wrapped by a while loop (THX r2evans) to ensure each column contains at least two 1's.

minval <- 0
while(minval < 2) {
  rows <- replicate(100, sample(c(0,0,0,1), 4))
  m <- t(rows)
  minval <- min(colSums(m))
}
 m
       [,1] [,2] [,3] [,4]
  [1,]    0    0    0    1
  [2,]    1    0    0    0
  [3,]    0    0    0    1
  [4,]    0    0    1    0
  [5,]    1    0    0    0
  [6,]    0    0    0    1
  [7,]    1    0    0    0
  [8,]    0    0    1    0
  [9,]    0    1    0    0
 [10,]    1    0    0    0

CodePudding user response：

Code:

v <- tabulate(sample(1:4, 100-2*4, replace=TRUE), nbins=4)   2
m <- diag(length(v))[sample(rep(seq_along(v), v)),]

Result check:

> dim(m)
[1] 100   4
> range(rowSums(m))
[1] 1 1
> range(colSums(m))
[1] 20 30

This works with any matrix size - just adjust the numbers 4 and 100. The first one controls the number of columns and the second one - the number of rows:

v <- tabulate(sample(1:10, 200-2*10, replace=TRUE), nbins=10)   2
m <- diag(length(v))[sample(rep(seq_along(v), v)),]

> dim(m)
[1] 200  10
> range(rowSums(m))
[1] 1 1
> range(colSums(m))
[1] 15 31

Explanation: this works backwards from the properties of the resulting matrix. If you have 100 rows and 4 columns, with each row having only one 1 then the matrix will have 100 1s in total. Which means that the sum of all column-sums should also be 100. So we start with a vector of numbers (summing up to 100) which represents how many 1s each column will have. Say this vector is c(50,25,20,5). This tells us that there will be 50 rows of the form (1,0,0,0), 25 rows with the form (0,1,0,0), and so on. The final step is to generate all these rows and shuffle them.

The trick here:

v <- tabulate(sample(1:4, 100-2*4, replace=TRUE), nbins=4)   2

Is to generate random column-sums while making sure the minimum is at least 2. We do this by generating values summing up to 92 and then adding 2 to each value (which, with 4 columns, ends up as additional 8).