I have a tibble
which resembles the following:
data<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"),
type=c("A", "B", "A", "A", "A", "A", "B"))
ref type
1 ABC A
2 ABC B
3 XYZ A
4 XYZ A
5 FGH A
6 FGH A
7 FGH B
I need to group by ref
and if--within a group--type
B is present, return that row, else default to return any row (but only 1 row) of type
A.
Expected output:
ref type
1 ABC B
2 XYZ A
3 FGH B
CodePudding user response:
with large amounts of data, it is better to do sorting before grouping
tidyverse
library(tidyverse)
df<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"),
type=c("A", "B", "A", "A", "A", "A", "B"))
distinct(df) %>%
arrange(ref, desc(type)) %>%
group_by(ref) %>%
slice_head(n = 1) %>%
ungroup()
#> # A tibble: 3 × 2
#> ref type
#> <chr> <chr>
#> 1 ABC B
#> 2 FGH B
#> 3 XYZ A
data.table
Created on 2022-04-27 by the reprex package (v2.0.1)
df<-data.frame(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"),
type=c("A", "B", "A", "A", "A", "A", "B"))
library(data.table)
setDT(df)[order(ref, -type), .SD[1], by = ref]
#> ref type
#> 1: ABC B
#> 2: FGH B
#> 3: XYZ A
Created on 2022-04-27 by the reprex package (v2.0.1)
CodePudding user response:
If you only have A
and B
, then you can arrange and simply get the first row, i.e.
library(dplyr)
data %>%
group_by(ref) %>%
filter(type %in% c('A', 'B')) %>% #If other types exist
arrange(desc(type)) %>%
slice(1L)
# A tibble: 3 x 2
# Groups: ref [3]
ref type
<chr> <chr>
1 ABC B
2 FGH B
3 XYZ A
CodePudding user response:
We can use which.max
over boolean to extract the desired rows
data %>%
group_by(ref) %>%
slice(which.max(type == "B")) %>%
ungroup()
which gives
# A tibble: 3 x 2
ref type
<chr> <chr>
1 ABC B
2 FGH B
3 XYZ A