I have a coding problem regarding subsetting my dataset. I would like to subset my data with the following conditions (1) one observation per ID and (2) retaining a row for "event" = 1 occurring at any time, while still not losing any observations.
An example dataset looks like this:
ID event
A 1
A 1
A 0
A 1
B 0
B 0
B 0
C 0
C 1
Desired output
A 1
B 0
C 1
I imagine this would be done using dplyr df >%> group_by(ID), but I'm unsure how to prioritize selecting for any row that contains event = 1 without losing when event = 0. I do not want to lose any of the IDs.
Any help would be appreciated - thank you very much.
CodePudding user response:
We may use
aggregate(event ~ ID, df1, max)
ID event
1 A 1
2 B 0
3 C 1
Or with dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
slice_max(n = 1, event, with_ties = FALSE) %>%
ungroup
# A tibble: 3 × 2
ID event
<chr> <int>
1 A 1
2 B 0
3 C 1
data
df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "C",
"C"), event = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L)),
class = "data.frame", row.names = c(NA,
-9L))