Home > OS >  How to enumerate each time there is a change in value of a dataframe column?
How to enumerate each time there is a change in value of a dataframe column?

Time:05-02

I have a Dataframe Below DFSurv, I would like to create the column Event which :

if TF[i]==TF[i-1] then Event[i] = Event[i-1] 
else Event[i] = Event[i-1]   1

this should be done for each group prov, and the lag is computed ordering by column Per.

The main idea is to add one whenever TF values are changing.

set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
                    prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
                    TF = sample(0:1,size=40,replace=TRUE))

When using dplyr::lag I am getting wrong results:

DFSurv %>% mutate(Event = 0) %>%
  arrange(prov, Per) %>%
  group_by(prov) %>%
  mutate(Event = if_else(TF == dplyr::lag(TF, default =0), 
                         dplyr::lag(Event, default =0), 
                         dplyr::lag(Event, default =0) 1))


# A tibble: 40 x 4
# Groups:   prov [4]
     Per prov     TF Event
   <int> <chr> <int> <dbl>
 1     1 A         0     0
 2     2 A         1     1
 3     3 A         0     1
 4     4 A         0     0
 5     5 A         1     1
 6     6 A         0     1
 7     7 A         0     0
 8     8 A         0     0
 9     9 A         1     1
10    10 A         1     0
# ... with 30 more rows

These results are wrong because Event[3] TF[3] != TF[2] thus the value should be Event[2] 1 which is 2.

This can be done with loops but a vectorized approach is preferred.

CodePudding user response:

The essence to solve your problem is cumsum.

Note that my set.seed result is different from yours.

library(dplyr)

set.seed(1)
DFSurv = data.frame(Per = c(1:10,1:10,1:10, 1:10),
                    prov = c(rep("A",10),rep("B",10), rep("D",10),rep("F",10)),
                    TF = sample(0:1,size=40,replace=TRUE))

DFSurv %>% 
  group_by(prov) %>% 
  mutate(Event = cumsum(abs(c(0, diff(TF)))))
#> # A tibble: 40 × 4
#> # Groups:   prov [4]
#>      Per prov     TF Event
#>    <int> <chr> <int> <dbl>
#>  1     1 A         0     0
#>  2     2 A         1     1
#>  3     3 A         0     2
#>  4     4 A         0     2
#>  5     5 A         1     3
#>  6     6 A         0     4
#>  7     7 A         0     4
#>  8     8 A         0     4
#>  9     9 A         1     5
#> 10    10 A         1     5
#> # … with 30 more rows

Created on 2022-05-01 by the reprex package (v2.0.1)

CodePudding user response:

Try this:

library(tidyverse)

set.seed(1)

DFSurv <- data.frame(
  Per = c(1:10, 1:10, 1:10, 1:10),
  prov = c(rep("A", 10), rep("B", 10), rep("D", 10), rep("F", 10)),
  TF = sample(0:1, size = 40, replace = TRUE)
)

DFSurv %>%
  arrange(prov, Per) %>%
  group_by(prov) %>%
  mutate(event = if_else(TF != lag(TF) & !is.na(lag(TF)), 1, 0),
         event_cum = cumsum(event))
#> # A tibble: 40 × 5
#> # Groups:   prov [4]
#>      Per prov     TF event event_cum
#>    <int> <chr> <int> <dbl>     <dbl>
#>  1     1 A         0     0         0
#>  2     2 A         1     1         1
#>  3     3 A         0     1         2
#>  4     4 A         0     0         2
#>  5     5 A         1     1         3
#>  6     6 A         0     1         4
#>  7     7 A         0     0         4
#>  8     8 A         0     0         4
#>  9     9 A         1     1         5
#> 10    10 A         1     0         5
#> # … with 30 more rows

Created on 2022-05-01 by the reprex package (v2.0.1)

  • Related