DATA = data.frame(STUDENT=c(1,1,1,1,2,2,2,3,3,3,3,3),
T = c(1,2,3,4,1,2,3,1,2,3,4,5),
SCORE=c(NA,1,5,2,3,4,4,1,4,5,2,2),
WANT=c('N','N','P','P','N','N','N','N','N','P','P','P'))
)
I have 'DATA' and wish to create 'WANT' variable where is 'N' but within each 'STUDENT' when there is a score of '5' OR HIGHER than the 'WANT' value is 'P' and stays that way I seek a dplyr solutions
CodePudding user response:
You can use cumany
:
library(dplyr)
DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = ifelse(cumany(ifelse(is.na(SCORE), 0, SCORE) == 5),
"N", "P"))
# A tibble: 12 × 5
# Groups: STUDENT [3]
STUDENT T SCORE WANT WANT2
<dbl> <dbl> <dbl> <chr> <chr>
1 1 1 NA N N
2 1 2 1 N N
3 1 3 5 P P
4 1 4 2 P P
5 2 1 3 N N
6 2 2 4 N N
7 2 3 4 N N
8 3 1 1 N N
9 3 2 4 N N
10 3 3 5 P P
11 3 4 2 P P
12 3 5 2 P P
CodePudding user response:
You can use cummax()
:
library(dplyr)
DATA %>%
group_by(STUDENT) %>%
mutate(WANT = c("N", "P")[cummax(SCORE >= 5 & !is.na(SCORE)) 1])
# A tibble: 12 × 4
# Groups: STUDENT [3]
STUDENT T SCORE WANT
<dbl> <dbl> <dbl> <chr>
1 1 1 NA N
2 1 2 1 N
3 1 3 5 P
4 1 4 2 P
5 2 1 3 N
6 2 2 4 N
7 2 3 4 N
8 3 1 1 N
9 3 2 4 N
10 3 3 5 P
11 3 4 2 P
12 3 5 2 P