Unfortunately, I can't wrap my head around this but I'm sure there is a straightforward solution. I've a data.frame
that looks like this:
set.seed(1)
mydf <- data.frame(group=sample(c("a", "b"), 20, replace=T))
I'd like to create a new variable that counts from top to bottom, how many times the group occured in a row. Hence, within the example from above it should look like this:
mydf$question <- c(1, 2, 1, 2, 1, 1, 2, 3, 4, 1, 2, 3, 1, 1, 1, 1, 1, 2, 1, 1)
> mydf[1:10,]
group question
1 a 1
2 a 2
3 b 1
4 b 2
5 a 1
6 b 1
7 b 2
8 b 3
9 b 4
10 a 1
Thanks for help.
CodePudding user response:
Using data.table::rleid
and dplyr
you could do:
set.seed(1)
mydf <- data.frame(group=sample(c("a", "b"), 20, replace=T))
library(dplyr)
library(data.table)
mydf %>%
mutate(id = data.table::rleid(group)) %>%
group_by(id) %>%
mutate(question = row_number()) %>%
ungroup()
#> # A tibble: 20 × 3
#> group id question
#> <chr> <int> <int>
#> 1 a 1 1
#> 2 b 2 1
#> 3 a 3 1
#> 4 a 3 2
#> 5 b 4 1
#> 6 a 5 1
#> 7 a 5 2
#> 8 a 5 3
#> 9 b 6 1
#> 10 b 6 2
#> 11 a 7 1
#> 12 a 7 2
#> 13 a 7 3
#> 14 a 7 4
#> 15 a 7 5
#> 16 b 8 1
#> 17 b 8 2
#> 18 b 8 3
#> 19 b 8 4
#> 20 a 9 1
CodePudding user response:
Update: Most is the same as stefan but without data.table
package:
library(dplyr)
mydf %>%
mutate(myrleid = with(rle(group), rep(seq_along(lengths), lengths))) %>%
group_by(myrleid) %>%
mutate(question = row_number()) %>%
ungroup()
group myrleid question
<chr> <int> <int>
1 a 1 1
2 b 2 1
3 a 3 1
4 a 3 2
5 b 4 1
6 a 5 1
7 a 5 2
8 a 5 3
9 b 6 1
10 b 6 2
11 a 7 1
12 a 7 2
13 a 7 3
14 a 7 4
15 a 7 5
16 b 8 1
17 b 8 2
18 b 8 3
19 b 8 4
20 a 9 1