Create a column for each group when the other column changes-CodePudding

I have the following sample dataset where i want to create the NUMBER_OF_RENEWALS column.

Basically, when the value for SUB_ID changes for each ID I would like to add 1 to the NUMBER_OF_RENEWALS column

Below, there is an example dataset with the desired outcome (column NUMBER_OF_RENEWALS).

ID <- c("A", "A", "A" ,"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D")
SUB_ID <- c("250", "252", "252", "252", "200", "201", "202", "220", "220", "220", "250", "250", "251", "252", "252")
NUMBER_OF_RENEWALS <- c(0,1,1,1,0,1,2,0,0,0,0,0,1,2,2)

sample_df <- data.frame(ID, SUB_ID,NUMBER_OF_RENEWALS)



   ID SUB_ID NUMBER_OF_RENEWALS
1   A    250                  0
2   A    252                  1
3   A    252                  1
4   A    252                  1
5   B    200                  0
6   B    201                  1
7   B    202                  2
8   C    220                  0
9   C    220                  0
10  C    220                  0
11  D    250                  0
12  D    250                  0
13  D    251                  1
14  D    252                  2
15  D    252                  2

CodePudding user response：

After grouping by ID you can use cumsum and increase the count for number of renewals when there's a change in SUB_ID:

library(tidyverse)

sample_df %>%
  group_by(ID) %>%
  mutate(NUMBER_OF_RENEWALS = cumsum(SUB_ID != lag(SUB_ID, default = first(SUB_ID))))

Or with data.table rleid you can do:

sample_df %>%
  group_by(ID) %>%
  mutate(NUMBER_OF_RENEWALS = data.table::rleid(SUB_ID) - 1)

Output

   ID    SUB_ID NUMBER_OF_RENEWALS
   <chr> <chr>               <int>
 1 A     250                     0
 2 A     252                     1
 3 A     252                     1
 4 A     252                     1
 5 B     200                     0
 6 B     201                     1
 7 B     202                     2
 8 C     220                     0
 9 C     220                     0
10 C     220                     0
11 D     250                     0
12 D     250                     0
13 D     251                     1
14 D     252                     2
15 D     252                     2

CodePudding user response：

Here's base R attempt -

transform(sample_df, NUMBER_OF_RENEWALS = 
      as.integer(ave(SUB_ID, ID, FUN = function(x) match(x, unique(x)))) - 1)

#   ID SUB_ID NUMBER_OF_RENEWALS
#1   A    250                  0
#2   A    252                  1
#3   A    252                  1
#4   A    252                  1
#5   B    200                  0
#6   B    201                  1
#7   B    202                  2
#8   C    220                  0
#9   C    220                  0
#10  C    220                  0
#11  D    250                  0
#12  D    250                  0
#13  D    251                  1
#14  D    252                  2
#15  D    252                  2