For fastening purpose, i'm trying to convert a simple 'for loop' in R into a Rcpp one.
I have a date vector named "date_vector" which is composed by X identical dates. For each iteration of i, I add 1 minutes to the date_vector value. The R 'for loop' (see below) works properly, but it is too slow for my very large dataset (2 years ~ 1million of rows).
I've read that Rccp could be a solution to speed up the loop. However, I'm a 'Rcpp' noob and I'm struggling to convert my loop.
Can someone help me and explain me the solution ? Thank you very much! Best wishes for 2023.
The orignial R Loop :
for(i in 2:nrow(klines)){
date_vector[i] <- date_vector[i-1] minutes(1)
}
My Rcpp loop attempt:
cpp_update_date_vector <- cppFunction('DateVector fonction_test(DateVector zz),
int n = zz.size();
DateVector = date_vector;
for (int i = 0; i < n; i ) {
date_vector[i] = date_vector[i-1] 60;
}
')
CodePudding user response:
You can likely achieve your goal without a loop at all. It sounds like you’re trying to change a vector of identical datetimes to a sequence one minute apart, right? If so, you could do:
library(lubridate)
date_vector <- rep(ymd_hms("2020-01-01 12:00:00"), 10)
date_vector minutes(seq_along(date_vector) - 1)
[1] "2020-01-01 12:00:00 UTC" "2020-01-01 12:01:00 UTC"
[3] "2020-01-01 12:02:00 UTC" "2020-01-01 12:03:00 UTC"
[5] "2020-01-01 12:04:00 UTC" "2020-01-01 12:05:00 UTC"
[7] "2020-01-01 12:06:00 UTC" "2020-01-01 12:07:00 UTC"
[9] "2020-01-01 12:08:00 UTC" "2020-01-01 12:09:00 UTC"
CodePudding user response:
For completeness, here is how you would write the code in Rcpp:
cpp_update_date_vector <- Rcpp::cppFunction('
DatetimeVector fonction_test(DatetimeVector zz) {
for (int i = 1; i < zz.size(); i ) {
zz[i] = zz[i-1] 60;
}
return zz;
}
')
But it is no faster then base R's seq
function, which can easily create a sequence of date-times 1 minute apart. Here is a comparison of the two methods on a 1,000,000-length date-time vector. Note that they are both comparable, and both considerably faster than using lubridate
.
microbenchmark::microbenchmark(
lubridate = big_vec lubridate::minutes(seq_along(big_vec) - 1),
Rcpp = cpp_update_date_vector(big_vec),
base_R = seq(big_vec[1], by = "1 min", length = 1000000)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> lubridate 1168.921 1203.845 1318.950 1215.465 1570.376 1691.765 100 b
#> Rcpp 3.733 3.770 8.742 3.799 3.909 467.236 100 a
#> base_R 2.172 2.338 3.167 2.407 2.484 40.222 100 a