Currently, I have my data arranged as a 2-factor data frame but I would like to convert the data to a 3-factor data frame as shown below,
ID STATE AGE W_NO
90500974 1 17 1
90500974 2 17 2
90500974 1 17 3
90500974 1 17 4
90500974 2 17 5
90500975 1 17 1
90500975 2 17 2
90500975 2 17 3
90500975 1 17 4
90500975 2 17 5
*ID STATE AGE W_NO
90500974 1 17 1
90500974 2 17 2
90500974 3 17 3
90500974 3 17 4
90500974 2 17 5
90500975 1 17 1
90500975 2 17 2
90500975 2 17 3
90500975 3 17 4
90500975 2 17 5*
i.e., all the 1's that appear after a 2 should be converted to 3 for each subject. How can this be done in Rstudio?
CodePudding user response:
This should work:
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(
STATE = ifelse(STATE == 1, STATE (cumsum(STATE == 2) > 0) * 2, STATE)
) %>%
ungroup()
# # A tibble: 10 × 4
# ID STATE AGE W_NO
# <int> <dbl> <int> <int>
# 1 90500974 1 17 1
# 2 90500974 2 17 2
# 3 90500974 3 17 3
# 4 90500974 3 17 4
# 5 90500974 2 17 5
# 6 90500975 1 17 1
# 7 90500975 2 17 2
# 8 90500975 2 17 3
# 9 90500975 3 17 4
# 10 90500975 2 17 5
(Using akrun's kindly shared data)
Or slightly more concisely, at the cost of some clarity:
df1 %>%
group_by(ID) %>%
mutate(
STATE = STATE (cumsum(STATE == 2) > 0 & STATE == 1) * 2
) %>%
ungroup()
CodePudding user response:
We could use
library(dplyr)
df1 %>%
group_by(ID) %>% mutate(STATE = replace(STATE,
cumsum(STATE == 1 & lag(STATE, default = first(STATE)) == 2) &
STATE == 1, 3)) %>%
ungroup
-output
# A tibble: 10 × 4
ID STATE AGE W_NO
<int> <dbl> <int> <int>
1 90500974 1 17 1
2 90500974 2 17 2
3 90500974 3 17 3
4 90500974 3 17 4
5 90500974 2 17 5
6 90500975 1 17 1
7 90500975 2 17 2
8 90500975 2 17 3
9 90500975 3 17 4
10 90500975 2 17 5
Or using data.table
library(data.table)
setDT(df1)[df1[, .I[seq_len(.N) > match(2, STATE) & STATE == 1],
ID]$V1, STATE := 3][]
data
df1 <- structure(list(ID = c(90500974L, 90500974L, 90500974L, 90500974L,
90500974L, 90500975L, 90500975L, 90500975L, 90500975L, 90500975L
), STATE = c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), AGE = c(17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L), W_NO = c(1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), class = "data.frame", row.names = c(NA,
-10L))