I've been using elev from the actuar package, but it is incredibly slow when there is a lot of data and a lot of limits at which to compute the limited expected value.
The link above explains what the empirical limited expected value is, but in short, the elev of a vector a
at a limit l
is mean(pmin(a,l))
.
I wrote my own vectorized function to try to speed up computing the elev of a vector at several limits:
lev <- function(a, L){
out <- numeric(length = length(L))
a_sum <- sum(a)
a_length <- length(a)
for(i in seq_along(L)){
out[i] <- (a_sum-sum(a[which(a>L[i])]-L[i]))/a_length
}
out
}
I compared the two on some test data:
a <- seq(1e8)
L <- seq(1e5, 1e8, 1e5)
elev_actuar <- elev(a)
elev_actuar(L) # this takes 1.9 minutes
lev(a, L) # this takes 45 seconds
Why is elev from actuar so much slower? And is there a way to make my function even more efficient?
CodePudding user response:
I think your loop is great. Let's implement it in Rcpp
.
rcppfun <- '
Rcpp::NumericVector myfun_cpp(Rcpp::NumericVector a, Rcpp::NumericVector L) {
int alen = a.size();
double asum = 0;
for (int i = 0; i < alen; i ) {
asum = asum a[i];
}
int Llen = L.size();
std::vector<double> out(Llen);
double tmp = 0;
for (int i = 0; i < Llen; i ) {
for (int j = 0; j < alen; j ) {
if (a[j] > L[i]) {
tmp = tmp a[j] - L[i];
}
out[i] = (asum - tmp)/alen;
}
tmp = 0;
}
return Rcpp::wrap(out);
}
'
library(Rcpp)
lev_rcpp <- cppFunction(rcppfun)
Usage
lev_rcpp(a, L)
Benchmark
library(actuar)
a <- seq(1e6)
L <- seq(1e4, 1e6, 1e4)
stopifnot(all.equal(elev_actuar(L), lev(a, L)) &
all.equal(elev_actuar(L), lev_rcpp(a, L)))
microbenchmark::microbenchmark(
elev_actuar(L), lev(a, L), lev_rcpp(a, L), times=3L
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# elev_actuar(L) 911.7452 917.2508 922.8843 922.7564 928.4538 934.1512 3 c
# lev(a, L) 774.0074 777.0622 778.7712 780.1170 781.1532 782.1893 3 b
# lev_rcpp(a, L) 262.7968 262.8886 262.9204 262.9804 262.9823 262.9842 3 a