I have a big correlation matrix and I want to get the column numbers of those columns with 1 in them (except the diagonal). Here is what I did that doesn't work:
# A toy example
m <- matrix(c(1,0,1,0,1,0,1,0,1), ncol = 3 , nrow = 3)
m <- m - diag(nrow(m)) #Removing 1's from the diagonal
col(m[which(m == 1)])
Is there any efficient solution for gigantic matrices?
CodePudding user response:
Base R
I'm not sure that this way is efficient for your case but you may try
cols <- c()
for (i in 1:ncol(m)){
if (any(m[,i] == 1 && !any(is.na(m[,i])))){
cols <- c(cols,i)
}
}
cols
[1] 1 3
dplyr
style
From your comment, I'll let m
as dataframe beforehand.
m <- as.data.frame(m)
n <- m %>%
dplyr::summarise(across(everything(), ~any(.x == 1) )) %>%
select_if(function(x) x == TRUE) %>%
names
which(colnames(m) %in% n)
[1] 1 3
CodePudding user response:
A solution that takes advantage of the arr.ind
argument in which()
:
# data
m <- matrix(c(1,0,1,0,1,0,1,0,1), ncol = 3 , nrow = 3)
m <- m - diag(nrow(m))
# extract array indices of those elements which are equal to 1
(idx <- which(m == 1, arr.ind = TRUE))
#> row col
#> [1,] 3 1
#> [2,] 1 3
# extract the (unique) col indices
unique(idx[, "col"])
#> [1] 1 3
# Let's test it with a bigger matrix
set.seed(1)
sample_0_1 <- sample(c(0L, 1L), size = 1e8, replace = TRUE, prob = c(0.999999, 0.000001))
m <- matrix(sample_0_1, nrow = 1e4, ncol = 1e4)
system.time(unique(which(m == 1L, arr.ind = TRUE)[, "col"]))
#> user system elapsed
#> 0.36 0.08 0.59
Created on 2021-09-30 by the reprex package (v2.0.1)
This is probably not the best approach, but it may be good enough for your problem.