I have the following data and I need to filter group id those have only single yes
value for column category
.
data <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,6,6),
category=c(NA,'no',NA,'yes',NA,'yes',NA,NA,'no',NA,'no','yes',NA,'no','yes','no','no',NA,'no',NA,'yes','no',NA,'yes','no',NA,'no','yes','no','yes'))
Expected output:
id category
1 4 no
2 4 <NA>
3 4 no
4 4 <NA>
5 4 yes
6 5 no
7 5 <NA>
8 5 yes
9 5 no
I try it using
library(dplyr)
data1 <- data %>%
group_by(id) %>%
filter((cumsum(category == 'yes')<= 1), na.rm=TRUE) %>%
ungroup
CodePudding user response:
You can follow your original approach with a minor edit:
data %>%
group_by(id) %>%
filter(sum(category=="yes",na.rm=T)==1)
Output:
id category
<dbl> <chr>
1 4 no
2 4 NA
3 4 no
4 4 NA
5 4 yes
6 5 no
7 5 NA
8 5 yes
9 5 no
CodePudding user response:
FYI, here is a base R approach:
#find counts for the yes/no
out<-table(data$category, data$id)
#extract the ids meeting the criteria
ids <- which(out["yes",]==1)
#Get the values
data[data$id %in% ids, ]