Home > front end >  EFficient way to check large sparse matrix for non-finite values in R
EFficient way to check large sparse matrix for non-finite values in R

Time:08-06

I have a large sparse matrix in R. After populating the matrix with some math, I realized I had some infinite values due to a divide by zero error.

How can I check the matrix for non-finite values?

Here is an example where, when trying to find non-finite values I get an error:

A <- Matrix(nrow = 150000, ncol = 150000, data = 0, sparse = TRUE)
A[1, 1] = Inf
A[1, 3] = NA
A[2, 1] = -Inf
test <- A[!is.finite(A)]
Error: cannot allocate vector of size 83.8 Gb

Below I manage to do this in a much less efficient way, but it takes forever. Is there a better way than this?

library(magrittr)
for(i in 1:nrow(A)){
    if((
        A[i, ] %>% .[!is.finite(.)] %>% length
    ) > 0) print(i)
}

I also tried running it in parallel, but I think it is overkill and it still takes a long time.

library(parallel)
library(magrittr)

numCores <- detectCores() - 1
cl <- makeCluster(numCores)
clusterExport(cl, c("A"))
clusterEvalQ(cl, library(magrittr))
out <- A %>% nrow %>% seq %>% parLapply(cl, X = ., function(i) A[i, ] %>% .[!is.finite(.)]) 

For example, it would be great if somehow I could force the output of A[!is.finite(A)] to be a sparse matrix, since 99.9999% of the output would be FALSE.

CodePudding user response:

If we want to know if a sparse matrix A has any Inf, -Inf, NaN or NA, we can do

any(!is.finite(A@x))
#[1] TRUE

If we also want to know their positions, we can do

subset(summary(A), !is.finite(x))
  i j    x
1 1 1  Inf
2 2 1 -Inf
3 1 3   NA

Remark:

See R: element-wise matrix division for distinctions between is.infinite, !is.finite, is.na and is.nan.

  • Related