Looking for a more efficient way to filter an array-CodePudding

I have two arrays I obtained from krige(), values and variances with a couple of million entries. Those two arrays are of the same length and match 1:1 with each other. I want to remove values that have a variance above a certain threshold. I don't really need to modify values in-place, generating a third array would be fine.

The following code works fine:

    for (i in 1:length(values)) {
      if (variances[i] > 0.8) {
        values[i] = NA
      }
    }

Unfortunately, it is very slow and use only a single processor core. Do I really need to handle the parallel calculations manually? This sounds generic enough so that it should be built-in in some way, not only by using more than one core, but maybe some vector processor instructions?

Please enlighten me.

CodePudding user response：

As long as those arrays match, you should be able to just subset one with another:

set.seed(1)
(values <- array(1:25, c(5,5)))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

(variances <- array(rnorm(25,.8,0.2),c(5,5)))
#>           [,1]      [,2]      [,3]      [,4]      [,5]
#> [1,] 0.6747092 0.6359063 1.1023562 0.7910133 0.9837955
#> [2,] 0.8367287 0.8974858 0.8779686 0.7967619 0.9564273
#> [3,] 0.6328743 0.9476649 0.6757519 0.9887672 0.8149130
#> [4,] 1.1190562 0.9151563 0.3570600 0.9642442 0.4021297
#> [5,] 0.8659016 0.7389223 1.0249862 0.9187803 0.9239651

is.na(values[variances > .8]) <- TRUE

values
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   NA   16   NA
#> [2,]   NA   NA   NA   17   NA
#> [3,]    3   NA   13   NA   NA
#> [4,]   NA   NA   14   NA   24
#> [5,]   NA   10   NA   NA   NA

For array length of 10 million it takes about a second on my laptop, data generation included:

system.time({
  values <- array(1:10e6, c(1000,10000))
  variances <- array(rnorm(10e6,.8,0.2),dim(values))
  is.na(values[variances > .8]) <- TRUE
})
#>    user  system elapsed 
#>    1.05    0.10    1.14

dim(variances)
#> [1]  1000 10000
object.size(variances)
#> 80000216 bytes
object.size(values)
#> 40000216 bytes

^{Created on 2023-01-18 with reprex v2.0.2}