Home > database >  Adding an index column to a dataframe, respecting periods of time and grouping
Adding an index column to a dataframe, respecting periods of time and grouping

Time:06-08

I have a data.frame with dates and groups. I want to add an index column to show each instance the group occurs. Each instance is defined as a continuous period of time. In the data.frame group 1 appears between '2022-05-02' and '2022-05-09' which will count as 1 instance and then again between '2022-06-13' and '2022-06-20' which counts as another instance. In the example, the index column is what I want, the index2 column is the best attempt I have done so far which is incorrect as it puts together all the group 1's as one instance.

library(tidyverse)
date<-as.Date(c('2022-05-02','2022-05-09', '2022-05-16', '2022-05-23', '2022-05-30','2022-06-06','2022-06-13','2022-06-20'))
gp<-c( "group1","group1", "group2","group2","group3","group3","group1","group1")
index<-c(1,1,2,2,3,3,4,4)


data<-data.frame(date,gp,index, stringsAsFactors = FALSE)

data<-data%>%
  mutate(index2=dense_rank(gp))

CodePudding user response:

You can replicate index in index2 using data.table::rleid()

data %>% mutate(index2=data.table::rleid(gp))

Output:

        date     gp index index2
1 2022-05-02 group1     1      1
2 2022-05-09 group1     1      1
3 2022-05-16 group2     2      2
4 2022-05-23 group2     2      2
5 2022-05-30 group3     3      3
6 2022-06-06 group3     3      3
7 2022-06-13 group1     4      4
8 2022-06-20 group1     4      4
  •  Tags:  
  • r
  • Related