Group by and conditional filter-CodePudding

I have a tibble which resembles the following:

data<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
             type=c("A", "B", "A", "A", "A", "A", "B"))

  ref   type 
1 ABC   A    
2 ABC   B    
3 XYZ   A
4 XYZ   A    
5 FGH   A    
6 FGH   A    
7 FGH   B

I need to group by ref and if--within a group--type B is present, return that row, else default to return any row (but only 1 row) of type A.

Expected output:

  ref   type 
1 ABC   B      
2 XYZ   A    
3 FGH   B

CodePudding user response：

with large amounts of data, it is better to do sorting before grouping

tidyverse

library(tidyverse)
df<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
             type=c("A", "B", "A", "A", "A", "A", "B"))

distinct(df) %>% 
  arrange(ref, desc(type)) %>% 
  group_by(ref) %>% 
  slice_head(n = 1) %>% 
  ungroup()
#> # A tibble: 3 × 2
#>   ref   type 
#>   <chr> <chr>
#> 1 ABC   B    
#> 2 FGH   B    
#> 3 XYZ   A

data.table

^{Created on 2022-04-27 by the reprex package (v2.0.1)}

df<-data.frame(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
             type=c("A", "B", "A", "A", "A", "A", "B"))

library(data.table)
setDT(df)[order(ref, -type), .SD[1], by = ref]
#>    ref type
#> 1: ABC    B
#> 2: FGH    B
#> 3: XYZ    A

^{Created on 2022-04-27 by the reprex package (v2.0.1)}

CodePudding user response：

If you only have A and B, then you can arrange and simply get the first row, i.e.

library(dplyr)

data %>% 
 group_by(ref) %>% 
 filter(type %in% c('A', 'B')) %>% #If other types exist
 arrange(desc(type)) %>% 
 slice(1L)

# A tibble: 3 x 2
# Groups:   ref [3]
  ref   type 
  <chr> <chr>
1 ABC   B    
2 FGH   B    
3 XYZ   A

CodePudding user response：

We can use which.max over boolean to extract the desired rows

data %>%
    group_by(ref) %>%
    slice(which.max(type == "B")) %>%
    ungroup()

which gives

# A tibble: 3 x 2
  ref   type 
  <chr> <chr>
1 ABC   B
2 FGH   B
3 XYZ   A