Home > front end >  Find novel categories between groups
Find novel categories between groups

Time:08-20

I am trying to identify which trees are different between two groups a & b across different forest types (type).

My dummy example:

dd1 <- data.frame(
  type = rep(1, 5),
  grp = c('a', 'a', 'a', 'b', 'b'),
  sp =  c('oak', 'beech', 'spruce',
          'oak', 'yew')
)

dd2 <- data.frame(
  type = rep(2, 3),
  grp = c('a', 'b', 'b'),
  sp =  c('oak', 'beech', 'spruce')
)


dd <- rbind(dd1, dd2)

I can find unique species by each group (in reality, two groups: type & grp) by distinct:

dd %>% 
  group_by(type, grp) %>% 
  distinct(sp)

But instead I want to know which trees in group b are different from group a?

Expected output:

   type grp   sp    
  <dbl> <chr> <chr> 
1     1 b     yew    # here, only `yew` is a new one; `oak` was previously listed in group `a`
2     2 b     beech  # both beech and spruce are new compared to group `a` 
3     2 b     spruce

How can I do this? Thank you!

CodePudding user response:

You could try an anti_join:

library(dplyr)
library(tidyr)

dd |>
  anti_join(dd |> filter(grp == "a"), by = c("sp", "type"))

Output:

  type grp     sp
1    1   b    yew
2    2   b  beech
3    2   b spruce

CodePudding user response:

The condition to filter is

library(dplyr)

dd %>%
  group_by(type) %>%
  filter(grp == 'b' & !sp %in% sp[grp == 'a']) %>%
  ungroup()

# # A tibble: 3 × 3
#    type grp   sp    
#   <dbl> <chr> <chr> 
# 1     1 b     yew   
# 2     2 b     beech 
# 3     2 b     spruce
  • Related