I have had trouble executing this command due to the extended time related.
df has 758k rows
x <- data.table(df$variable1) # day of week (1-7 integer) from case 1
y <- data.table(df$variable2) # day of week (1-7 integer) from case 2
z <- data.table(df$variable3) # date object (ymd)
w <- data.table(as.Date())
for(i in 1:nrow(df)){
if(x[i,1]<y[i,1]){
w[i,1] <- y[i,1]-x[i,1] z[i,1]
} else if(x[i,1]>y[i,1]){
w[i,1] <- 7-(x[i,1]-y[i,1]) z[i,1]
} else {
w[i,1] <- z[i,1]
}
}
I have spent 60-70 minutes running this entire code and generating the data.table "w" with specific values based on the for/if loops calculation. Does anyone have a divine solution?
CodePudding user response:
This should be more idiomatic, but I might have made a syntax error and can't confirm it works (or how much faster) without some sample data.
df$w = ifelse(x < y, y - x z, ifelse(x > y, 7-(x - y z), z))
or with dplyr
:
library(dplyr)
df %>% mutate(w = case_when(x<y ~ y - x z,
x>y ~ 7-(x-y z),
TRUE ~ z)
CodePudding user response:
I recommend you work directly on df
after converting it to a data.table
. fcase
is one way to vectorize the operation:
library(data.table)
n <- 758e3L
df <- data.frame(
x = sample(7, n, TRUE),
y = sample(7, n, TRUE),
z = sample(seq.Date(as.Date('2021/01/01'), as.Date('2021/12/31'), 1), n, TRUE)
)
system.time({
setDT(df)[
, w := fcase(
x < y, y - x z,
x > y, 7 - x y z,
x == y, z
)
]
})
#> user system elapsed
#> 0.00 0.02 0.02
head(df)
#> x y z w
#> 1: 6 6 2021-11-22 2021-11-22
#> 2: 5 2 2021-08-08 2021-08-12
#> 3: 7 6 2021-06-12 2021-06-18
#> 4: 2 7 2021-08-19 2021-08-24
#> 5: 2 3 2021-02-01 2021-02-02
#> 6: 6 7 2021-03-12 2021-03-13