I want to filter grouped data with specific column value but with missing value. For example, id=2
have only No
value, id=5
and id=7
have No
but there are NA
values. I need only these three Ids.
df2 <- data.frame(id=c(1,1,1,1,1,1,1,2,2,3,3,3,3,3,4,4,4,5,5,6,6,7,7,7,7),
cate=c('Yes',NA,NA,'Yes','Yes','Yes','Yes','No','No','No','No','Yes','Yes',
'No','No','Yes','Yes','No',NA,'No','Yes','No',NA,NA,'No'))
I expected
id cate
<dbl> <chr>
1 2 No
2 2 No
3 5 No
4 5 NA
5 7 No
6 7 NA
7 7 NA
8 7 No
I try with
librrary(tidyverse)
df2 %>%
group_by(id) %>%
filter(all(cate == 'No'))
but not select id 5 and 7.
CodePudding user response:
You could add na.rm = TRUE
into all()
to allow NA
in each ID.
library(dplyr)
df2 %>%
group_by(id) %>%
filter(all(cate == 'No', na.rm = TRUE)) %>%
ungroup()
# A tibble: 8 × 2
# id cate
# <dbl> <chr>
# 1 2 No
# 2 2 No
# 3 5 No
# 4 5 NA
# 5 7 No
# 6 7 NA
# 7 7 NA
# 8 7 No
CodePudding user response:
The all()
function will only return ids in your group by where ALL values in cate, per id, meet the condition, cate == 'No'. id=7 has some NAs so the all()
function won't return id=7.
This will return ids if their values are either 'No' OR NA:
df2 %>%
group_by(id) %>%
filter(all(cate== 'No'|is.na(cate)))
Output:
# A tibble: 8 × 2
# Groups: id [3]
id cate
<dbl> <chr>
1 2 No
2 2 No
3 5 No
4 5 NA
5 7 No
6 7 NA
7 7 NA
8 7 No
CodePudding user response:
base
R option using ave
with all
like this:
df2[with(df2, ave((cate== 'No'|is.na(cate)), id, FUN = all)), ]
#> id cate
#> 8 2 No
#> 9 2 No
#> 18 5 No
#> 19 5 <NA>
#> 22 7 No
#> 23 7 <NA>
#> 24 7 <NA>
#> 25 7 No
Created on 2022-08-27 with reprex v2.0.2