Home > Mobile >  How to use functions to do a recursive calculation in data.table/R?
How to use functions to do a recursive calculation in data.table/R?

Time:11-11

I am new to Programming and got stuck in it. I wanted to calculate the hourly temperature variation of an object throughout the year using some variables, which changes in every hour. The original data contains 60 columns and 8760 rows for the calculation.

I got the desired output using the for loop, but the model is taking a lot of time for the calculation. I wonder if there is any way to replace the loop with functions, which I suspect, can also increase the speed of the calculations.

Here is a small reproducible example to show what I did.

table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))

table
   A B  C
1: 1 1 10
2: 1 2 10
3: 1 3 10
4: 1 4 10
5: 1 5 10

The forloop

for (j in (2: nrow(table))) {
  table$A[j] = (table$A[j-1]   table$B[j-1]) * table$B[j]
  table$C[j] = table$B[j] * table$A[j] 
 }

I got the output as I desired:

     A B    C
1:   1 1   10
2:   4 2    8
3:  18 3   54
4:  84 4  336
5: 440 5 2200

but it took 15 min to run the whole program in my case (not this!)

So I tried to use function instead of the for loop.

I tried this:

table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))


myfun <- function(df){
  df = df %>% mutate(A = (lag(A)   lag(B)) * B, 
                     C = B * A)
  return(df)
}

myfun(table)

But the output was

   A B   C
1 NA 1  NA
2  4 2   8
3  9 3  27
4 16 4  64
5 25 5 125

As it seems that the function refers to the rows of the first table not the updated rows after the calculation. Is there any way to obtain the desired output using functions? It is my first R project, any help is very much appreciated. Thank you.

CodePudding user response:

A much faster alternative using data.table. Note that the calculation of C can be separated from the calculation of A so we can do less within the loop:

for (i in 2:nrow(table)) {
  set(table, i = i, j = "A", value = with(table, (A[i-1]   B[i-1]) * B[i]))
}
table[-1, C := A * B]
table

#        A     B     C
#    <num> <int> <num>
# 1:     1     1    10
# 2:     4     2     8
# 3:    18     3    54
# 4:    84     4   336
# 5:   440     5  2200

CodePudding user response:

Here's a solution using purrr::accumulate2 which lets you use the result of the previous computation as the input to the next one:

library(data.table)
library(purrr)
library(magrittr)

table <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10))

table$A <- accumulate2(
  table$A,
  seq(table$A),
  ~ (..1   table$B[..3]) * table$B[..3   1],
  .init = table$A[1]
) %>%
  unlist() %>%
  extract(1:nrow(table))
  
table$C <- table$B * table$A

table
#      A B    C
# 1:   1 1    1
# 2:   4 2    8
# 3:  18 3   54
# 4:  84 4  336
# 5: 440 5 2200

CodePudding user response:

You can try Reduce like below (given dt <- data.table("A" = c(1), "B" = c(1:5), "C" = c(10)))

dt[
  ,
  A := Reduce(function(x, Y) (x   Y[2]) * Y[1],
    asplit(embed(B, 2), 1),
    init = A[1],
    accumulate = TRUE
  )
][
  ,
  C := A * B
]

which updates dt as

> dt
     A B    C
1:   1 1    1
2:   4 2    8
3:  18 3   54
4:  84 4  336
5: 440 5 2200
  • Related