Home > Enterprise >  Counting unique strings across multiple groups in R
Counting unique strings across multiple groups in R

Time:07-30

Here's my data:

db = data.frame(plot = c('a', 'a', 'a', 'a', 'a',
                         'b', 'b', 'b', 'b', 'b',
                         'c', 'c', 'c', 'c', 'c'),
                spp = c('sp1', 'sp1', 'sp2', 'sp1', 'sp3',
                        'sp1', 'sp1', 'sp2', 'sp4', 'sp4',
                        'sp1', 'sp2', 'sp5', 'sp6', 'sp7'))

What I'm trying to do is find, for each plot, the number of unique strings that occur only at each plot when compared to all others.

I was using

db_sum <- db %>%
     group_by(plot) %>%
     summarise(n_unique = n_distinct(across(spp)))

but that returns the number of unique strings in each plot, and not the number of unique strings in each plot when compared to all others.

For example, the aforementioned function returns a = 3, b = 3, c = 5, but what I want is a = 1, b = 1, c = 3 (that is, a has one spp that is not shared by the others plot and the same goes for the others)

How can I proceed?

CodePudding user response:

We could use base R with table

table(subset(unique(db), spp %in% names(which(table(spp) == 1)))$plot)

a b c 
1 1 3 
  • Related