Home > Enterprise >  Optimize loops for/if in R
Optimize loops for/if in R

Time:07-30

I have had trouble executing this command due to the extended time related.

df has 758k rows

x <- data.table(df$variable1) # day of week (1-7 integer) from case 1
y <- data.table(df$variable2) # day of week (1-7 integer) from case 2
z <- data.table(df$variable3) # date object (ymd)
w <- data.table(as.Date())

for(i in 1:nrow(df)){
  if(x[i,1]<y[i,1]){
    w[i,1] <- y[i,1]-x[i,1] z[i,1]
  } else if(x[i,1]>y[i,1]){
    w[i,1] <- 7-(x[i,1]-y[i,1]) z[i,1]
  } else {
    w[i,1] <- z[i,1]
  }
}

I have spent 60-70 minutes running this entire code and generating the data.table "w" with specific values based on the for/if loops calculation. Does anyone have a divine solution?

CodePudding user response:

This should be more idiomatic, but I might have made a syntax error and can't confirm it works (or how much faster) without some sample data.

df$w = ifelse(x < y, y - x   z, ifelse(x > y, 7-(x - y   z), z))

or with dplyr:

library(dplyr)
df %>% mutate(w = case_when(x<y  ~ y - x   z,
                            x>y  ~ 7-(x-y z),
                            TRUE ~ z)

CodePudding user response:

I recommend you work directly on df after converting it to a data.table. fcase is one way to vectorize the operation:

library(data.table)

n <- 758e3L

df <- data.frame(
  x = sample(7, n, TRUE),
  y = sample(7, n, TRUE),
  z = sample(seq.Date(as.Date('2021/01/01'), as.Date('2021/12/31'), 1), n, TRUE)
)
system.time({
  setDT(df)[
    , w := fcase(
      x < y, y - x   z,
      x > y, 7 - x   y   z,
      x == y, z
    )
  ]
})
#>    user  system elapsed 
#>    0.00    0.02    0.02

head(df)
#>    x y          z          w
#> 1: 6 6 2021-11-22 2021-11-22
#> 2: 5 2 2021-08-08 2021-08-12
#> 3: 7 6 2021-06-12 2021-06-18
#> 4: 2 7 2021-08-19 2021-08-24
#> 5: 2 3 2021-02-01 2021-02-02
#> 6: 6 7 2021-03-12 2021-03-13
  • Related