I have a big correlation matrix and I want to get the column numbers of those columns with 1 in them (except the diagonal). Here is what I did that doesn't work:

# A toy example

m <- matrix(c(1,0,1,0,1,0,1,0,1), ncol = 3 , nrow = 3)
m <- m - diag(nrow(m)) #Removing 1's from the diagonal

col(m[which(m == 1)])

Is there any efficient solution for gigantic matrices?

CodePudding user response：

Base R

I'm not sure that this way is efficient for your case but you may try

cols <- c()
for (i in 1:ncol(m)){
  if (any(m[,i] == 1 && !any(is.na(m[,i])))){
    cols <- c(cols,i)
  }
}
cols

[1] 1 3

`dplyr` style

From your comment, I'll let m as dataframe beforehand.

m <- as.data.frame(m)
n <- m %>%
  dplyr::summarise(across(everything(), ~any(.x == 1)  )) %>%
  select_if(function(x) x == TRUE) %>%
  names 
which(colnames(m) %in% n)

[1] 1 3

CodePudding user response：

A solution that takes advantage of the arr.ind argument in which():

# data
m <- matrix(c(1,0,1,0,1,0,1,0,1), ncol = 3 , nrow = 3)
m <- m - diag(nrow(m))

# extract array indices of those elements which are equal to 1
(idx <- which(m == 1, arr.ind = TRUE))
#>      row col
#> [1,]   3   1
#> [2,]   1   3
# extract the (unique) col indices
unique(idx[, "col"])
#> [1] 1 3

# Let's test it with a bigger matrix
set.seed(1)
sample_0_1 <- sample(c(0L, 1L), size = 1e8, replace = TRUE, prob = c(0.999999, 0.000001))
m <- matrix(sample_0_1, nrow = 1e4, ncol = 1e4)

system.time(unique(which(m == 1L, arr.ind = TRUE)[, "col"]))
#>    user  system elapsed 
#>    0.36    0.08    0.59

^{Created on 2021-09-30 by the reprex package (v2.0.1)}

This is probably not the best approach, but it may be good enough for your problem.

Base R

dplyr style

`dplyr` style