I am new to Programming and got stuck in it. I wanted to calculate the hourly temperature variation of an object throughout the year using some variables, which changes in every hour. The original data contains 60 columns and 8760 rows for the calculation.
I got the desired output using the for
loop, but the model is taking a lot of time for the calculation. I wonder if there is any way to replace the loop with functions, which I suspect, can also increase the speed of the calculations.
Here is a small reproducible example to show what I did.
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
table
A B C
1: 1 1 10
2: 1 2 10
3: 1 3 10
4: 1 4 10
5: 1 5 10
The for
loop
for (j in (2: nrow(table))) {
table$A[j] = (table$A[j-1] table$B[j-1]) * table$B[j]
table$C[j] = table$B[j] * table$A[j]
}
I got the output as I desired:
A B C
1: 1 1 10
2: 4 2 8
3: 18 3 54
4: 84 4 336
5: 440 5 2200
but it took 15 min to run the whole program in my case (not this!)
So I tried to use function instead of the for
loop.
I tried this:
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
myfun <- function(df){
df = df %>% mutate(A = (lag(A) lag(B)) * B,
C = B * A)
return(df)
}
myfun(table)
But the output was
A B C
1 NA 1 NA
2 4 2 8
3 9 3 27
4 16 4 64
5 25 5 125
As it seems that the function refers to the rows of the first table not the updated rows after the calculation. Is there any way to obtain the desired output using functions? It is my first R project, any help is very much appreciated. Thank you.
CodePudding user response:
A much faster alternative using data.table
. Note that the calculation of C
can be separated from the calculation of A
so we can do less within the loop:
for (i in 2:nrow(table)) {
set(table, i = i, j = "A", value = with(table, (A[i-1] B[i-1]) * B[i]))
}
table[-1, C := A * B]
table
# A B C
# <num> <int> <num>
# 1: 1 1 10
# 2: 4 2 8
# 3: 18 3 54
# 4: 84 4 336
# 5: 440 5 2200
CodePudding user response:
Here's a solution using purrr::accumulate2
which lets you use the result of the previous computation as the input to the next one:
library(data.table)
library(purrr)
library(magrittr)
table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
table$A <- accumulate2(
table$A,
seq(table$A),
~ (..1 table$B[..3]) * table$B[..3 1],
.init = table$A[1]
) %>%
unlist() %>%
extract(1:nrow(table))
table$C <- table$B * table$A
table
# A B C
# 1: 1 1 1
# 2: 4 2 8
# 3: 18 3 54
# 4: 84 4 336
# 5: 440 5 2200
CodePudding user response:
You can try Reduce
like below (given dt <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))
)
dt[
,
A := Reduce(function(x, Y) (x Y[2]) * Y[1],
asplit(embed(B, 2), 1),
init = A[1],
accumulate = TRUE
)
][
,
C := A * B
]
which updates dt
as
> dt
A B C
1: 1 1 1
2: 4 2 8
3: 18 3 54
4: 84 4 336
5: 440 5 2200