I am trying to create an event counter that counts the number of events based on transitions in two variables, var1
and var2
. In the hypothetical data frame the expected behaviour is in the event
column.
dt <- data.frame(var1 = c(-1, 1),
var2 = c('A', 'B', 'A', 'B'),
event = c(1, 1, 2, 2))
In the above dataframe using match
, unique
and group_by
does not give the desired behaviour.
dt %>%
group_by(var2) %>%
mutate(event2 = match(var1, unique(var1)))
However when I create a data frame with 3 instances of var1 and var2
dt2 <- data.frame(var1 = c(rep(-1, 3), rep(1, 3), rep(1, 3), rep(-1, 3)),
var2 = c(rep('A', 3), rep('B', 3), rep('A', 3), rep('B', 3)),
event = c(rep(1, 6), rep(2, 6)))
Using match
, unique
and group_by
reproduces the desired behaviour. What its the reason for this and is there a way to create an event counter (or id) that identifies unique instances of 1 and -1 in var1 along with unique instances of A and B in var2 and increments an event counter irrespective of the number of values of 1, -1 and A, B in var1 and var2 respectively.
CodePudding user response:
You can use cumsum
and lag
.
library(dplyr)
dt2 %>%
mutate(event = cumsum(var2 == "A" & lag(var2, default = "B") != "A"))
var1 var2 event
1 -1 A 1
2 -1 A 1
3 -1 A 1
4 1 B 1
5 1 B 1
6 1 B 1
7 1 A 2
8 1 A 2
9 1 A 2
10 -1 B 2
11 -1 B 2
12 -1 B 2