Given a matrix like mat
> set.seed(1)
> mat <- matrix(rbinom(100,1,0.5),10,10)
> rownames(mat) <- paste0(sample(LETTERS[1:2],10,replace=T),c(1:nrow(mat)))
> colnames(mat) <- paste0(sample(LETTERS[1:2],10,replace=T),c(1:ncol(mat)))
> mat
A1 A2 A3 B4 B5 B6 B7 A8 B9 B10
B1 0 0 1 0 1 0 1 0 0 0
B2 0 0 0 1 1 1 0 1 1 0
B3 1 1 1 0 1 0 0 0 0 1
A4 1 0 0 0 1 0 0 0 0 1
A5 0 1 0 1 1 0 1 0 1 1
A6 1 0 0 1 1 0 0 1 0 1
A7 1 1 0 1 0 0 0 1 1 0
B8 1 1 0 0 0 1 1 0 0 0
A9 1 0 1 1 1 1 0 1 0 1
A10 0 1 0 0 1 0 1 1 0 1
I want to sample submatrices of the form:
A B
A 0 1
B 1 0
EDIT: Specifically, the submatrix should contain a 1 in the A-row/B-column, a 1 in the B-row/A-column, and 0s in the other two cells.
I can do this by randomly selecting one A-row and one B-row, then randomly selecting one A-column and one B-column, then checking whether it has the desired pattern. But, I'm trying to find a more efficient method that will work even in large/sparse matrices. Thanks!
CodePudding user response:
You can sample
on the dimension names:
set.seed(1)
dims <- lapply(dimnames(mat), \(x) c(sample(which(grepl("A", x)), 1), sample(which(grepl("B", x)), 1)))
mat[dims[[1]], dims[[2]]]
A3 B4
A4 0 0
B8 0 0
CodePudding user response:
A solution that first finds all matching submatrices, then samples on them.
library(Matrix)
set.seed(1)
m <- as(matrix(rbinom(100,1,0.5),10,10),"sparseMatrix")
m2 <- as(m[-1, -ncol(m)]*m[-nrow(m), -1]*!(m[-1,-1] | m[-nrow(m), -ncol(m)]), "dgTMatrix")
length(m2@i) # number of matching submatrices
#> [1] 7
# sample matching submatrices
idx <- sample(length(m2@i), 5)
# put the row, column indices of the top left of the submatrices into a
# data.frame for further processing
(topleft <- data.frame(i = m2@i[idx], j = m2@j[idx])) 1L
#> i j
#> 1 1 6
#> 2 6 9
#> 3 5 1
#> 4 7 7
#> 5 5 8
# timing on a 10k square matrix with 1M elements
idx <- sample(1e8, 1e6)
m <- sparseMatrix(i = ((idx - 1) %% 1e4) 1, j = ((idx - 1) %/% 1e4) 1)
system.time({
m2 <- as(m[-1, -ncol(m)]*m[-nrow(m), -1]*!(m[-1,-1] m[-nrow(m), -ncol(m)]), "dgTMatrix")
# sample 1k matching submatrices
idx <- sample(length(m2@i), 1e3)
# put the row, column indices of the top left of the submatrices into a
# data.frame for further processing
topleft <- data.frame(i = m2@i[idx], j = m2@j[idx]) 1L
})
#> user system elapsed
#> 3.23 1.17 4.43
length(m2@i) # number of matching submatrices
#> [1] 9841