Suppose I have this data (sorted by id, Date and value):
id value Date
---- -------- -----------
1 A 2018-03-07
1 A 2018-03-26
1 B 2018-04-01
1 B 2018-04-15
1 C 2018-05-07
1 A 2018-06-26
2 A 2018-03-07
2 A 2018-03-26
2 B 2018-04-01
2 C 2018-04-15
2 C 2018-05-07
2 B 2018-06-26
The data can be generated using this code:
df <- as.data.frame(cbind(
id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
value = c("A", "A", "B", "B", "C", "A", "A", "A", "B", "C", "C", "B"),
Date = c("2018-03-27", "2018-03-26", "2018-04-01", "2018-04-15", "2018-05-07", "2018-06-26","2018-03-27", "2018-03-26", "2018-04-01", "2018-04-15", "2018-05-07", "2018-06-26")
))
I want to compare value
column from each (N 1) with the N row within the same id
column and add a new column to record the row number for each row. If the value
changes, the row number column adds 1 from the current number, for example:
id value Date Row_Number
---- -------- ----------- -----------
1 A 2018-03-07 1
1 A 2018-03-26 1
1 B 2018-04-01 2
1 B 2018-04-15 2
1 C 2018-05-07 3
1 A 2018-06-26 4
2 A 2018-03-07 1
2 A 2018-03-26 1
2 B 2018-04-01 2
2 C 2018-04-15 3
2 C 2018-05-07 3
2 B 2018-06-26 4
CodePudding user response:
You can use data.table::rleid
:
df %>%
group_by(id) %>%
mutate(rn = rleid(value))
# A tibble: 12 × 4
# Groups: id [2]
id value Date rn
<chr> <chr> <chr> <int>
1 1 A 2018-03-27 1
2 1 A 2018-03-26 1
3 1 B 2018-04-01 2
4 1 B 2018-04-15 2
5 1 C 2018-05-07 3
6 1 A 2018-06-26 4
7 2 A 2018-03-27 1
8 2 A 2018-03-26 1
9 2 B 2018-04-01 2
10 2 C 2018-04-15 3
11 2 C 2018-05-07 3
12 2 B 2018-06-26 4