My data looks something like this:
id <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2)
var1 <- c(1,2,2,3,4,4,4,4,1,1,2,2,2,3)
var2 <- c(1,2,2,2,2,3,4,4,1,1,1,2,2,2)
df <- data.frame(id,var1,var2)
Which would look like this:
id var1 var2
1 1 1
1 2 2
1 2 2
1 3 2
1 4 2
1 4 3
1 4 4
1 4 4
2 1 1
2 1 1
2 2 1
2 2 2
2 2 2
2 3 2
I would like to create a new column which increments by 1 for each change in either var1 or var2 and resets for each id. So my desired output would be:
id var1 var2 var3
1 1 1 1
1 2 2 2
1 2 2 2
1 3 2 3
1 4 2 4
1 4 3 5
1 4 4 6
1 4 4 6
2 1 1 1
2 1 1 1
2 2 1 2
2 2 2 3
2 2 2 3
2 3 2 4
I tried multiple things I found here, but nothing does exactly what I am looking for. For example both:
DT <- as.data.table(df, keep.rownames = T)
DT[, var3 := .GRP, by = list(id, var1, var2)][]
or
indx <- as.character(interaction(df[c("id", "var1", "var2")]))
df$var3<- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))
do not reset for a new subject. I appreciate your help.
CodePudding user response:
df %>%
group_by(id) %>%
mutate(var3 = cumsum(c(0, diff(var1) | diff(var2)) != 0) 1) %>%
ungroup
# A tibble: 14 x 4
id var1 var2 var3
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 2 2 2
3 1 2 2 2
4 1 3 2 3
5 1 4 2 4
6 1 4 3 5
7 1 4 4 6
8 1 4 4 6
9 2 1 1 1
10 2 1 1 1
11 2 2 1 2
12 2 2 2 3
13 2 2 2 3
14 2 3 2 4
CodePudding user response:
Using cumsum
, increment by 1 if:
- the previous value of
var1
is different from the current value - the previous value of
var2
is different from the current value.
library(tidyverse)
df %>%
group_by(id) %>%
mutate(var3 = cumsum(lag(var1, default = 0) != var1 | lag(var2, default = 0) != var2))
If you have more than two columns, you can use if_any
:
df %>%
group_by(id) %>%
mutate(var3 = cumsum(if_any(var1:var2, ~ c(0, diff(.x)) != 0)) 1)
output
# A tibble: 14 x 4
# Groups: id [2]
id var1 var2 var3
<dbl> <dbl> <dbl> <int>
1 1 1 1 1
2 1 2 2 2
3 1 2 2 2
4 1 3 2 3
5 1 4 2 4
6 1 4 3 5
7 1 4 4 6
8 1 4 4 6
9 2 1 1 1
10 2 1 1 1
11 2 2 1 2
12 2 2 2 3
13 2 2 2 3
14 2 3 2 4
CodePudding user response:
Using data.table
- shift
can take multiple columns at once, by default the type = "lag"
, use Map
to check if the corresponding values are not equal to the columns (after grouping by 'id') and then Reduce
the logical columns to a single logical vector and get the cumsum
library(data.table)
setDT(df)[, var3 := cumsum(Reduce(`|`, Map(`!=` ,
shift(.SD, fill = 0), .SD))), id]
-output
> df
id var1 var2 var3
<num> <num> <num> <int>
1: 1 1 1 1
2: 1 2 2 2
3: 1 2 2 2
4: 1 3 2 3
5: 1 4 2 4
6: 1 4 3 5
7: 1 4 4 6
8: 1 4 4 6
9: 2 1 1 1
10: 2 1 1 1
11: 2 2 1 2
12: 2 2 2 3
13: 2 2 2 3
14: 2 3 2 4