Home > Software engineering >  conditionally filter with NA
conditionally filter with NA

Time:12-26

I want to filter by ID where GRP is A or B or NA. If one ID contains both A B then B should be deleted.

Data:

   DF<-tibble::tribble(
      ~ID, ~GRP,
       1L,  "A",
       2L,  "A",
       2L,  "B",
       3L,  "B",
       3L,   NA,
       4L,  "A",
       4L,  "A",
       4L,   NA
      )


# A tibble: 8 × 2
     ID GRP  
  <int> <chr>
1     1 A    
2     2 A    
3     2 B    
4     3 B    
5     3 NA   
6     4 A    
7     4 A    
8     4 NA  

Desired output:

 A tibble: 7 × 2
     ID GRP  
  <int> <chr>
1     1 A    
2     2 A    
3     3 B    
4     3 NA   
5     4 A    
6     4 A    
7     4 NA 

Best regards, H

CodePudding user response:

We could construct a logic with n_distinct after grouping by 'ID'

library(dplyr)
DF %>% 
  group_by(ID) %>% 
  filter(all(c("A", "B") %in% GRP) & 
              GRP != "B"|
      (is.na(GRP)|n_distinct(GRP, na.rm = TRUE) == 1)) %>%
  ungroup

-output

# A tibble: 7 × 2
     ID GRP  
  <int> <chr>
1     1 A    
2     2 A    
3     3 B    
4     3 <NA> 
5     4 A    
6     4 A    
7     4 <NA> 

Or create the logic with the first non-NA element

DF %>%
   group_by(ID) %>% 
   filter(GRP %in% first(na.omit(GRP)) | is.na(GRP)) %>%
   ungroup
# A tibble: 7 × 2
     ID GRP  
  <int> <chr>
1     1 A    
2     2 A    
3     3 B    
4     3 <NA> 
5     4 A    
6     4 A    
7     4 <NA> 

Or may use two n_distinct

DF %>%
   group_by(ID) %>%
   filter(n_distinct(GRP, na.rm = TRUE) == 2 & GRP != "B"| 
          n_distinct(GRP, na.rm = TRUE) == 1) %>% 
   ungroup

Or using base R

i1 <- with(DF, ID %in% names(which(rowSums(table(ID, GRP) > 0) == 2)))
subset(DF, GRP == 'A' & i1|!i1)
  •  Tags:  
  • r
  • Related