Home > Software design >  R subsetting by unique observation and prioritizing a value
R subsetting by unique observation and prioritizing a value

Time:01-30

I have a coding problem regarding subsetting my dataset. I would like to subset my data with the following conditions (1) one observation per ID and (2) retaining a row for "event" = 1 occurring at any time, while still not losing any observations.

An example dataset looks like this:

 ID event
 A  1
 A  1
 A  0
 A  1
 B  0
 B  0
 B  0
 C  0
 C  1
 

Desired output

 A  1
 B  0
 C  1

I imagine this would be done using dplyr df >%> group_by(ID), but I'm unsure how to prioritize selecting for any row that contains event = 1 without losing when event = 0. I do not want to lose any of the IDs.

Any help would be appreciated - thank you very much.

CodePudding user response:

We may use

aggregate(event ~ ID, df1, max)
   ID event
1  A     1
2  B     0
3  C     1

Or with dplyr

library(dplyr)
df1 %>%
   group_by(ID) %>%
   slice_max(n = 1, event, with_ties = FALSE) %>%
   ungroup
# A tibble: 3 × 2
  ID    event
  <chr> <int>
1 A         1
2 B         0
3 C         1

data

df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "C", 
"C"), event = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L)), 
class = "data.frame", row.names = c(NA, 
-9L))
  • Related