I have a dataframe (df) that looks similar to this:
person | outcome |
---|---|
a | 1 |
a | 1 |
a | 0 |
a | 0 |
a | 0 |
b | 1 |
b | 0 |
b | 1 |
c | 1 |
c | 1 |
c | 0 |
c | 0 |
c | 0 |
For persons whose last observation is a 0, I would like to remove the trailing 0s plus the last 1, so that the final df looks like this:
person | outcome |
---|---|
a | 1 |
b | 1 |
b | 0 |
b | 1 |
c | 1 |
The last three 0s and last 1 were removed for A and C, but B was left alone because its last observation was a 1. Is there a way to do this, or does it have to be done by hand?
CodePudding user response:
May be this helps
library(dplyr)
df %>%
group_by(person) %>%
mutate(new = cumsum(outcome)) %>%
filter(if(last(outcome) == 0) new <max(new) else TRUE) %>%
ungroup %>%
select(-new)
-output
# A tibble: 5 × 2
person outcome
<chr> <int>
1 a 1
2 b 1
3 b 0
4 b 1
5 c 1
data
df <- structure(list(person = c("a", "a", "a", "a", "a", "b", "b",
"b", "c", "c", "c", "c", "c"), outcome = c(1L, 1L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-13L))