I have a data.frame with dates and groups. I want to add an index column to show each instance the group occurs. Each instance is defined as a continuous period of time. In the data.frame group 1 appears between '2022-05-02' and '2022-05-09' which will count as 1 instance and then again between '2022-06-13' and '2022-06-20' which counts as another instance. In the example, the index column is what I want, the index2 column is the best attempt I have done so far which is incorrect as it puts together all the group 1's as one instance.
library(tidyverse)
date<-as.Date(c('2022-05-02','2022-05-09', '2022-05-16', '2022-05-23', '2022-05-30','2022-06-06','2022-06-13','2022-06-20'))
gp<-c( "group1","group1", "group2","group2","group3","group3","group1","group1")
index<-c(1,1,2,2,3,3,4,4)
data<-data.frame(date,gp,index, stringsAsFactors = FALSE)
data<-data%>%
mutate(index2=dense_rank(gp))
CodePudding user response:
You can replicate index
in index2
using data.table::rleid()
data %>% mutate(index2=data.table::rleid(gp))
Output:
date gp index index2
1 2022-05-02 group1 1 1
2 2022-05-09 group1 1 1
3 2022-05-16 group2 2 2
4 2022-05-23 group2 2 2
5 2022-05-30 group3 3 3
6 2022-06-06 group3 3 3
7 2022-06-13 group1 4 4
8 2022-06-20 group1 4 4