More efficient function and for loop-CodePudding

I am trying to do a more efficient for loop. I know the existence of sapply, laaply, etc. but I don't know how to implement it in my code.

I have my function which I don't know if it is very efficient. I think I should improve this but I don't know how.

myfun <- function(a, b, c) {
  sum <- 0
  iter <- 0
  while (sum < c) {
    nr <- runif(1, a, b)
    sum <- sum   nr
    iter <- iter   1
  }
  return(iter)
}

And here is the part which I would like to use an sapply or something similar.

a <- 0
b <- 1
c <- 2
x <- 0
for (i in 1:10^9) {
  x <- x   myfun(a, b, c)
}

Also, I need to make a sapply similar to this

sapply(1:10^9, functie(a ,b ,c))

But the sapply uses 1:10^9 as parameters, instead of a, b, c.

CodePudding user response：

I think replicate() is what you may be looking for (I changed your n to something smaller).

set.seed(1234)

n <- 10^2

y <- replicate(n, myfun(a,b,c))
sum(y)
# [1] 462

This matches your prior result.

set.seed(1234)

a <-0
b <-1
c <-2
x <-0
for (i in 1:n){
  x <- x   myfun(a,b,c)
}

x
# [1] 462

CodePudding user response：

I would probably solve this using purrr::map(). E.g. like this:

c(1:1e9) %>% 
  purrr::map_dbl(
    ~ myfun(a, b, c)
  ) %>% 
  sum()

This first calls myfun() the same number of times as the length of c(1:1e9), and stores the results in a numeric vector, then it uses sum() to add the results together.

My tests shows it's a bit faster than using replicate().

CodePudding user response：

You're doing it right, in my honest opinion. Since you don't need to return a vectorized or multi-dimensional result but instead update an existing object at each iteration, the for loop you're suggesting is more than adequate.

If you want to take a look at some great discussion about this topic I suggest you to look at this link: https://r4ds.had.co.nz/iteration.html

Edit: just to address the speed argument

start <- Sys.time()
purrr::map_dbl(1:1000, function(x) y   myfun(a, b, c)) %>% sum
end <- Sys.time()
end - start

# Time difference of 0.02593184 secs

start <- Sys.time()
y <- replicate(1000, myfun(a,b,c))
cumsum(y)[1000]
end <- Sys.time()
end - start

# Time difference of 0.01755929 secs

y <- 0
start <- Sys.time()
for(i in 1:1000){
  y<- y   myfun(a,b,c)
}
end <- Sys.time()
end - start

# Time difference of 0.01459098 secs

CodePudding user response：

Here is a recursive function f() that does the same job as myfun().

f <- function(s=0) {
  if (s[length(s)] >= 2) {
    return(length(s) - 1L)
  } else {
    f(c(s, s[length(s)]   runif(1, 0L, 1L)))
  }
}

set.seed(42)

f()
# [1] 3

replicate(8, f())
# [1] 4 5 4 4 3 5 3 5

stopifnot(all.equal({set.seed(42);f()}, {set.seed(42);myfun(0, 1, 2)}))

However (and most likely for that reason), it's just cooler, not faster :

# Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#     f 21.57227 22.01614 23.61562 22.30010 26.18903 28.19850   100   b
# myfun 16.20270 16.52542 17.76446 16.70385 19.44336 22.15172   100  a 

set.seed(42); R <- 1e3
microbenchmark::microbenchmark(
  f=replicate(R, f()), myfun=replicate(R, myfun(0, 1, 2)), times=1e2L,
  control=list(warmup=1e1L))