Home > OS >  Filter grouped data with specific column value
Filter grouped data with specific column value

Time:08-28

I want to filter grouped data with specific column value but with missing value. For example, id=2 have only No value, id=5 and id=7 have No but there are NA values. I need only these three Ids.

df2 <- data.frame(id=c(1,1,1,1,1,1,1,2,2,3,3,3,3,3,4,4,4,5,5,6,6,7,7,7,7),
                 cate=c('Yes',NA,NA,'Yes','Yes','Yes','Yes','No','No','No','No','Yes','Yes',
                        'No','No','Yes','Yes','No',NA,'No','Yes','No',NA,NA,'No'))

I expected

     id cate 
  <dbl> <chr>
1     2 No   
2     2 No  
3     5 No
4     5 NA
5     7 No
6     7 NA
7     7 NA
8     7 No

I try with

librrary(tidyverse)

df2 %>% 
  group_by(id) %>% 
  filter(all(cate == 'No')) 

but not select id 5 and 7.

CodePudding user response:

You could add na.rm = TRUE into all() to allow NA in each ID.

library(dplyr)

df2 %>%
  group_by(id) %>%
  filter(all(cate == 'No', na.rm = TRUE)) %>%
  ungroup()

# A tibble: 8 × 2
#      id cate 
#   <dbl> <chr>
# 1     2 No   
# 2     2 No   
# 3     5 No   
# 4     5 NA   
# 5     7 No   
# 6     7 NA   
# 7     7 NA   
# 8     7 No

CodePudding user response:

The all() function will only return ids in your group by where ALL values in cate, per id, meet the condition, cate == 'No'. id=7 has some NAs so the all() function won't return id=7.

This will return ids if their values are either 'No' OR NA:

df2 %>% 
  group_by(id) %>% 
  filter(all(cate== 'No'|is.na(cate))) 

Output:

# A tibble: 8 × 2
# Groups:   id [3]
     id cate 
  <dbl> <chr>
1     2 No   
2     2 No   
3     5 No   
4     5 NA   
5     7 No   
6     7 NA   
7     7 NA   
8     7 No   

CodePudding user response:

base R option using ave with all like this:

df2[with(df2, ave((cate== 'No'|is.na(cate)), id, FUN = all)), ]
#>    id cate
#> 8   2   No
#> 9   2   No
#> 18  5   No
#> 19  5 <NA>
#> 22  7   No
#> 23  7 <NA>
#> 24  7 <NA>
#> 25  7   No

Created on 2022-08-27 with reprex v2.0.2

  • Related