I have the following sample dataset where i want to create the NUMBER_OF_RENEWALS column.
Basically, when the value for SUB_ID changes for each ID I would like to add 1 to the NUMBER_OF_RENEWALS column
Below, there is an example dataset with the desired outcome (column NUMBER_OF_RENEWALS).
ID <- c("A", "A", "A" ,"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D")
SUB_ID <- c("250", "252", "252", "252", "200", "201", "202", "220", "220", "220", "250", "250", "251", "252", "252")
NUMBER_OF_RENEWALS <- c(0,1,1,1,0,1,2,0,0,0,0,0,1,2,2)
sample_df <- data.frame(ID, SUB_ID,NUMBER_OF_RENEWALS)
ID SUB_ID NUMBER_OF_RENEWALS
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
CodePudding user response:
After grouping by ID
you can use cumsum
and increase the count for number of renewals when there's a change in SUB_ID
:
library(tidyverse)
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = cumsum(SUB_ID != lag(SUB_ID, default = first(SUB_ID))))
Or with data.table
rleid
you can do:
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = data.table::rleid(SUB_ID) - 1)
Output
ID SUB_ID NUMBER_OF_RENEWALS
<chr> <chr> <int>
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
CodePudding user response:
Here's base R attempt -
transform(sample_df, NUMBER_OF_RENEWALS =
as.integer(ave(SUB_ID, ID, FUN = function(x) match(x, unique(x)))) - 1)
# ID SUB_ID NUMBER_OF_RENEWALS
#1 A 250 0
#2 A 252 1
#3 A 252 1
#4 A 252 1
#5 B 200 0
#6 B 201 1
#7 B 202 2
#8 C 220 0
#9 C 220 0
#10 C 220 0
#11 D 250 0
#12 D 250 0
#13 D 251 1
#14 D 252 2
#15 D 252 2