I have a Dataframe Below DFSurv, I would like to create the column Event which :
if TF[i]==TF[i-1] then Event[i] = Event[i-1]
else Event[i] = Event[i-1] 1
this should be done for each group prov
, and the lag is computed ordering by column Per
.
The main idea is to add one whenever TF values are changing.
set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
TF = sample(0:1,size=40,replace=TRUE))
When using dplyr::lag
I am getting wrong results:
DFSurv %>% mutate(Event = 0) %>%
arrange(prov, Per) %>%
group_by(prov) %>%
mutate(Event = if_else(TF == dplyr::lag(TF, default =0),
dplyr::lag(Event, default =0),
dplyr::lag(Event, default =0) 1))
# A tibble: 40 x 4
# Groups: prov [4]
Per prov TF Event
<int> <chr> <int> <dbl>
1 1 A 0 0
2 2 A 1 1
3 3 A 0 1
4 4 A 0 0
5 5 A 1 1
6 6 A 0 1
7 7 A 0 0
8 8 A 0 0
9 9 A 1 1
10 10 A 1 0
# ... with 30 more rows
These results are wrong because Event[3] TF[3] != TF[2]
thus the value should be Event[2] 1
which is 2.
This can be done with loops but a vectorized approach is preferred.
CodePudding user response:
The essence to solve your problem is cumsum
.
Note that my set.seed
result is different from yours.
library(dplyr)
set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
TF = sample(0:1,size=40,replace=TRUE))
DFSurv %>%
group_by(prov) %>%
mutate(Event = cumsum(abs(c(0, diff(TF)))))
#> # A tibble: 40 × 4
#> # Groups: prov [4]
#> Per prov TF Event
#> <int> <chr> <int> <dbl>
#> 1 1 A 0 0
#> 2 2 A 1 1
#> 3 3 A 0 2
#> 4 4 A 0 2
#> 5 5 A 1 3
#> 6 6 A 0 4
#> 7 7 A 0 4
#> 8 8 A 0 4
#> 9 9 A 1 5
#> 10 10 A 1 5
#> # … with 30 more rows
Created on 2022-05-01 by the reprex package (v2.0.1)
CodePudding user response:
Try this:
library(tidyverse)
set.seed(1)
DFSurv <- data.frame(
Per = c(1:10, 1:10, 1:10, 1:10),
prov = c(rep("A", 10), rep("B", 10), rep("D", 10), rep("F", 10)),
TF = sample(0:1, size = 40, replace = TRUE)
)
DFSurv %>%
arrange(prov, Per) %>%
group_by(prov) %>%
mutate(event = if_else(TF != lag(TF) & !is.na(lag(TF)), 1, 0),
event_cum = cumsum(event))
#> # A tibble: 40 × 5
#> # Groups: prov [4]
#> Per prov TF event event_cum
#> <int> <chr> <int> <dbl> <dbl>
#> 1 1 A 0 0 0
#> 2 2 A 1 1 1
#> 3 3 A 0 1 2
#> 4 4 A 0 0 2
#> 5 5 A 1 1 3
#> 6 6 A 0 1 4
#> 7 7 A 0 0 4
#> 8 8 A 0 0 4
#> 9 9 A 1 1 5
#> 10 10 A 1 0 5
#> # … with 30 more rows
Created on 2022-05-01 by the reprex package (v2.0.1)