The question is: Does R code run faster in a function?
Consider following examples:
> start<-Sys.time()
> for(i in 1:10000){}
> Sys.time()-start
Time difference of 0.01399994 secs
>
> fn<-function(){
start<-Sys.time()
for(i in 1:10000){}
Sys.time()-start
}
> fn()
Time difference of 0.00199604 secs
start<-Sys.time()
for(i in 1:10000){x<-100}
Sys.time()-start
Time difference of 0.012995 secs
fn<-function(){
start<-Sys.time()
for(i in 1:10000){x<-100}
Sys.time()-start
}
fn()
Time difference of 0.008996964 secs
The result is the same after increasing number of iterations as shown below:
> sim<-10000000
> start<-Sys.time()
> for(i in 1:sim){x<-i}
> Sys.time()-start
Time difference of 2.832 secs
>
> fn<-function(){
start<-Sys.time()
for(i in 1:sim){x<-i}
Sys.time()-start
}
> fn()
Time difference of 2.017997 secs
I would guess it's not a coincidence!
CodePudding user response:
Functions in R are compiled by the JIT compiler. After this happens, most functions will be faster.
As the docs in ?compiler::enableJIT
say,
JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed. JIT level 3 requires the compiler option optimize to be 2 or 3. The JIT level can also be selected by starting R with the environment variable R_ENABLE_JIT set to one of these values. Calling enableJIT with a negative argument returns the current JIT level. The default JIT level is 3.
So many functions will be faster than top level code.
CodePudding user response:
Credits and "upvotes" go to @user2554330 please for finding out the reason.
To prove the JIT-behaviour I have used this benchmark:
library(microbenchmark)
compiler::enableJIT(0)
fn <- function() {
for(i in 1:10000) {}
}
microbenchmark(for_loop_without_func = for(i in 1:10000) {},
for_loop_in_func = fn(),
times = 100)
The result shows that with disabled JIT the execution time is nearly the same:
Unit: microseconds
expr min lq mean median uq max neval
for_loop_without_func 180.619 180.7990 182.7129 180.9290 181.050 239.489 100
for_loop_in_func 182.582 182.7075 186.2232 182.7625 182.938 309.912 100
With compiler::enableJIT(3)
(which is the default) the function is faster:
Unit: microseconds
expr min lq mean median uq max neval
for_loop_without_func 558.727 574.4875 659.21931 657.3425 702.6475 1984.351 100
for_loop_in_func 53.019 53.4955 61.59588 53.7260 54.0320 790.632 100