changing 2 state factor to 3 state factor 2-CodePudding

Currently, I have my data arranged as a 2-factor data frame but I would like to convert the data to a 3-factor data frame as shown below,

ID       STATE AGE  W_NO
90500974 1     17   1
90500974 2     17   2
90500974 1     17   3
90500974 1     17   4
90500974 2     17   5
90500975 1     17   1
90500975 2     17   2
90500975 2     17   3
90500975 1     17   4
90500975 2     17   5

*ID       STATE AGE  W_NO
90500974 1     17   1
90500974 2     17   2
90500974 3     17   3
90500974 3     17   4
90500974 2     17   5
90500975 1     17   1
90500975 2     17   2
90500975 2     17   3
90500975 3     17   4
90500975 2     17   5*

i.e., all the 1's that appear after a 2 should be converted to 3 for each subject. How can this be done in Rstudio?

CodePudding user response：

This should work:

library(dplyr)
df1 %>%
  group_by(ID) %>%
  mutate(
    STATE = ifelse(STATE == 1, STATE   (cumsum(STATE == 2) > 0) * 2, STATE)
  ) %>%
  ungroup()
# # A tibble: 10 × 4
#          ID STATE   AGE  W_NO
#       <int> <dbl> <int> <int>
#  1 90500974     1    17     1
#  2 90500974     2    17     2
#  3 90500974     3    17     3
#  4 90500974     3    17     4
#  5 90500974     2    17     5
#  6 90500975     1    17     1
#  7 90500975     2    17     2
#  8 90500975     2    17     3
#  9 90500975     3    17     4
# 10 90500975     2    17     5

(Using akrun's kindly shared data)

Or slightly more concisely, at the cost of some clarity:

df1 %>%
  group_by(ID) %>%
  mutate(
    STATE = STATE   (cumsum(STATE == 2) > 0 & STATE == 1) * 2
  ) %>%
  ungroup()

CodePudding user response：

We could use

library(dplyr)
df1 %>%
    group_by(ID) %>% mutate(STATE = replace(STATE,
      cumsum(STATE == 1 & lag(STATE, default = first(STATE)) == 2) & 
       STATE == 1, 3)) %>% 
    ungroup

-output

# A tibble: 10 × 4
         ID STATE   AGE  W_NO
      <int> <dbl> <int> <int>
 1 90500974     1    17     1
 2 90500974     2    17     2
 3 90500974     3    17     3
 4 90500974     3    17     4
 5 90500974     2    17     5
 6 90500975     1    17     1
 7 90500975     2    17     2
 8 90500975     2    17     3
 9 90500975     3    17     4
10 90500975     2    17     5

Or using data.table

library(data.table)
setDT(df1)[df1[,  .I[seq_len(.N) > match(2, STATE) & STATE == 1], 
        ID]$V1, STATE := 3][]

data

df1 <- structure(list(ID = c(90500974L, 90500974L, 90500974L, 90500974L, 
90500974L, 90500975L, 90500975L, 90500975L, 90500975L, 90500975L
), STATE = c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), AGE = c(17L, 
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L), W_NO = c(1L, 2L, 
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), class = "data.frame", row.names = c(NA, 
-10L))