I am trying to identify which trees are different between two groups a
& b
across different forest types (type
).
My dummy example:
dd1 <- data.frame(
type = rep(1, 5),
grp = c('a', 'a', 'a', 'b', 'b'),
sp = c('oak', 'beech', 'spruce',
'oak', 'yew')
)
dd2 <- data.frame(
type = rep(2, 3),
grp = c('a', 'b', 'b'),
sp = c('oak', 'beech', 'spruce')
)
dd <- rbind(dd1, dd2)
I can find unique species by each group (in reality, two groups: type & grp) by distinct
:
dd %>%
group_by(type, grp) %>%
distinct(sp)
But instead I want to know which trees in group b
are different from group a
?
Expected output:
type grp sp
<dbl> <chr> <chr>
1 1 b yew # here, only `yew` is a new one; `oak` was previously listed in group `a`
2 2 b beech # both beech and spruce are new compared to group `a`
3 2 b spruce
How can I do this? Thank you!
CodePudding user response:
You could try an anti_join
:
library(dplyr)
library(tidyr)
dd |>
anti_join(dd |> filter(grp == "a"), by = c("sp", "type"))
Output:
type grp sp
1 1 b yew
2 2 b beech
3 2 b spruce
CodePudding user response:
The condition to filter is
library(dplyr)
dd %>%
group_by(type) %>%
filter(grp == 'b' & !sp %in% sp[grp == 'a']) %>%
ungroup()
# # A tibble: 3 × 3
# type grp sp
# <dbl> <chr> <chr>
# 1 1 b yew
# 2 2 b beech
# 3 2 b spruce