Home > database >  Row_number based on multiple columns sorted by other columns in R
Row_number based on multiple columns sorted by other columns in R

Time:10-15

Suppose I have this data (sorted by id, Date and value):

  id   value      Date 
 ---- --------   -----------    
   1    A        2018-03-07     
   1    A        2018-03-26
   1    B        2018-04-01     
   1    B        2018-04-15
   1    C        2018-05-07     
   1    A        2018-06-26

   2    A        2018-03-07     
   2    A        2018-03-26
   2    B        2018-04-01     
   2    C        2018-04-15
   2    C        2018-05-07     
   2    B        2018-06-26
 

The data can be generated using this code:

df <- as.data.frame(cbind( 
                      id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), 
                   value = c("A", "A", "B", "B", "C", "A", "A", "A", "B", "C", "C", "B"),
       
                   Date = c("2018-03-27", "2018-03-26", "2018-04-01", "2018-04-15", "2018-05-07", "2018-06-26","2018-03-27", "2018-03-26", "2018-04-01", "2018-04-15", "2018-05-07", "2018-06-26")
                      ))

I want to compare value column from each (N 1) with the N row within the same id column and add a new column to record the row number for each row. If the value changes, the row number column adds 1 from the current number, for example:

 id     value      Date          Row_Number
 ---- --------   -----------   ----------- 
   1    A        2018-03-07     1
   1    A        2018-03-26     1
   1    B        2018-04-01     2
   1    B        2018-04-15     2
   1    C        2018-05-07     3
   1    A        2018-06-26     4

   2    A        2018-03-07     1
   2    A        2018-03-26     1
   2    B        2018-04-01     2
   2    C        2018-04-15     3
   2    C        2018-05-07     3
   2    B        2018-06-26     4

CodePudding user response:

You can use data.table::rleid:

df %>%
  group_by(id) %>%
  mutate(rn = rleid(value))

# A tibble: 12 × 4
# Groups:   id [2]
   id    value Date          rn
   <chr> <chr> <chr>      <int>
 1 1     A     2018-03-27     1
 2 1     A     2018-03-26     1
 3 1     B     2018-04-01     2
 4 1     B     2018-04-15     2
 5 1     C     2018-05-07     3
 6 1     A     2018-06-26     4
 7 2     A     2018-03-27     1
 8 2     A     2018-03-26     1
 9 2     B     2018-04-01     2
10 2     C     2018-04-15     3
11 2     C     2018-05-07     3
12 2     B     2018-06-26     4
  • Related