Faster way to compute empirical limited expected value in R-CodePudding

I've been using elev from the actuar package, but it is incredibly slow when there is a lot of data and a lot of limits at which to compute the limited expected value.

The link above explains what the empirical limited expected value is, but in short, the elev of a vector a at a limit l is mean(pmin(a,l)).

I wrote my own vectorized function to try to speed up computing the elev of a vector at several limits:

lev <- function(a, L){
  out <- numeric(length = length(L))
  a_sum <- sum(a)
  a_length <- length(a)
  for(i in seq_along(L)){
    out[i] <- (a_sum-sum(a[which(a>L[i])]-L[i]))/a_length
  }
  out
}

I compared the two on some test data:

a <- seq(1e8)
L <- seq(1e5, 1e8, 1e5)

elev_actuar <- elev(a)
elev_actuar(L) # this takes 1.9 minutes

lev(a, L) # this takes 45 seconds

Why is elev from actuar so much slower? And is there a way to make my function even more efficient?

CodePudding user response：

I think your loop is great. Let's implement it in Rcpp.

rcppfun <- '
Rcpp::NumericVector myfun_cpp(Rcpp::NumericVector a, Rcpp::NumericVector L) {
  int alen = a.size();
  double asum = 0;
  for (int i = 0; i < alen; i  ) {
    asum = asum   a[i];
  }
  int Llen = L.size();
  std::vector<double> out(Llen);
  double tmp = 0;
  for (int i = 0; i < Llen; i  ) {
    for (int j = 0; j < alen; j  ) {
      if (a[j] > L[i]) {
        tmp = tmp   a[j] - L[i];
      }
    out[i] = (asum - tmp)/alen;
    }
    tmp = 0;
  }
  return Rcpp::wrap(out);    
}
'

library(Rcpp)
lev_rcpp <- cppFunction(rcppfun)

Usage

lev_rcpp(a, L)

Benchmark

library(actuar)

a <- seq(1e6)
L <- seq(1e4, 1e6, 1e4)

stopifnot(all.equal(elev_actuar(L), lev(a, L)) &
            all.equal(elev_actuar(L), lev_rcpp(a, L)))

microbenchmark::microbenchmark(
  elev_actuar(L), lev(a, L), lev_rcpp(a, L), times=3L
)
# Unit: milliseconds
#             expr      min       lq     mean   median       uq      max neval cld
#   elev_actuar(L) 911.7452 917.2508 922.8843 922.7564 928.4538 934.1512     3   c
#        lev(a, L) 774.0074 777.0622 778.7712 780.1170 781.1532 782.1893     3  b 
#   lev_rcpp(a, L) 262.7968 262.8886 262.9204 262.9804 262.9823 262.9842     3 a