wondering if there was an efficient and terse/concise data.table solution to the following problem.
Please suppose that I have the following data.table:
library(data.table)
DT <- data.table(store = c("A", "A", "A", "A", "B", "B", "B", "B"),
time = c(1,2,3,4,1,2,3,4),
treat_time = c(0,0,1,0, 0,1,0,0))
Here, treat_time
is the time period where the store receives a treatment. Note that the time of the treatment is different between stores A and stores B. I would like to create a column time_rel
, which describes the time period relative to when treat_time = 1. That is, the data.table should look like:
DT_outcome <- data.table(store = c("A", "A", "A", "A", "B", "B", "B", "B"),
time = c(1,2,3,4,1,2,3,4),
treat_time = c(0,0,1,0, 0,1,0,0),
time_rel = c(-1,0,1,2, 0,1,2,3))
store time treat_time time_rel
<char> <num> <num> <num>
1: A 1 0 -1
2: A 2 0 0
3: A 3 1 1
4: A 4 0 2
5: B 1 0 0
6: B 2 1 1
7: B 3 0 2
8: B 4 0 3
Thanks!
CodePudding user response:
We can subtract from the sequence
library(data.table)
DT[, time_rel := seq_len(.N) - seq_len(.N)[treat_time == 1] 1 ,store]
-output
> DT
store time treat_time time_rel
<char> <num> <num> <num>
1: A 1 0 -1
2: A 2 0 0
3: A 3 1 1
4: A 4 0 2
5: B 1 0 0
6: B 2 1 1
7: B 3 0 2
8: B 4 0 3
Or the same logic in dplyr
library(dplyr)
DT %>%
group_by(store) %>%
mutate(time_rel = row_number() - which(treat_time == 1) 1) %>%
ungroup