I am trying to do a more efficient for loop. I know the existence of sapply, laaply, etc. but I don't know how to implement it in my code.
I have my function which I don't know if it is very efficient. I think I should improve this but I don't know how.
myfun <- function(a, b, c) {
sum <- 0
iter <- 0
while (sum < c) {
nr <- runif(1, a, b)
sum <- sum nr
iter <- iter 1
}
return(iter)
}
And here is the part which I would like to use an sapply or something similar.
a <- 0
b <- 1
c <- 2
x <- 0
for (i in 1:10^9) {
x <- x myfun(a, b, c)
}
Also, I need to make a sapply similar to this
sapply(1:10^9, functie(a ,b ,c))
But the sapply uses 1:10^9 as parameters, instead of a, b, c.
CodePudding user response:
I think replicate()
is what you may be looking for (I changed your n
to something smaller).
set.seed(1234)
n <- 10^2
y <- replicate(n, myfun(a,b,c))
sum(y)
# [1] 462
This matches your prior result.
set.seed(1234)
a <-0
b <-1
c <-2
x <-0
for (i in 1:n){
x <- x myfun(a,b,c)
}
x
# [1] 462
CodePudding user response:
I would probably solve this using purrr::map()
. E.g. like this:
c(1:1e9) %>%
purrr::map_dbl(
~ myfun(a, b, c)
) %>%
sum()
This first calls myfun()
the same number of times as the length of c(1:1e9)
, and stores the results in a numeric vector, then it uses sum()
to add the results together.
My tests shows it's a bit faster than using replicate()
.
CodePudding user response:
You're doing it right, in my honest opinion. Since you don't need to return a vectorized or multi-dimensional result but instead update an existing object at each iteration, the for loop you're suggesting is more than adequate.
If you want to take a look at some great discussion about this topic I suggest you to look at this link: https://r4ds.had.co.nz/iteration.html
Edit: just to address the speed argument
start <- Sys.time()
purrr::map_dbl(1:1000, function(x) y myfun(a, b, c)) %>% sum
end <- Sys.time()
end - start
# Time difference of 0.02593184 secs
start <- Sys.time()
y <- replicate(1000, myfun(a,b,c))
cumsum(y)[1000]
end <- Sys.time()
end - start
# Time difference of 0.01755929 secs
y <- 0
start <- Sys.time()
for(i in 1:1000){
y<- y myfun(a,b,c)
}
end <- Sys.time()
end - start
# Time difference of 0.01459098 secs
CodePudding user response:
Here is a recursive function f()
that does the same job as myfun()
.
f <- function(s=0) {
if (s[length(s)] >= 2) {
return(length(s) - 1L)
} else {
f(c(s, s[length(s)] runif(1, 0L, 1L)))
}
}
set.seed(42)
f()
# [1] 3
replicate(8, f())
# [1] 4 5 4 4 3 5 3 5
stopifnot(all.equal({set.seed(42);f()}, {set.seed(42);myfun(0, 1, 2)}))
However (and most likely for that reason), it's just cooler, not faster :
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# f 21.57227 22.01614 23.61562 22.30010 26.18903 28.19850 100 b
# myfun 16.20270 16.52542 17.76446 16.70385 19.44336 22.15172 100 a
set.seed(42); R <- 1e3
microbenchmark::microbenchmark(
f=replicate(R, f()), myfun=replicate(R, myfun(0, 1, 2)), times=1e2L,
control=list(warmup=1e1L))