Home > database >  how to determine the number of unique values based on multiple criteria dplyr
how to determine the number of unique values based on multiple criteria dplyr

Time:08-04

I've got a df that looks like:

df(site=c(A,B,C,D,E), species=c(1,2,3,4), Year=c(1980:2010).

I would like to calculate the number of different years that each species appear in each site, creating a new column called nYear, I've tried filtering by group and using mutate combined with ndistinct values but it is not quite working.

Here is part of the code I have been using:

Df1 <- Df %>%
  filter(Year>1985)%>%
  mutate(nYear = n_distinct(Year[Year %in% site]))%>%
  group_by(Species,Site, Year) %>% 
  arrange(Species, .by_group=TRUE) 
  ungroup()

CodePudding user response:

The approach is good, a few things to correct.

First, let's make some reproducible data (your code gave errors).

df <- data.frame("site"=LETTERS[1:5], "species"=1:5, "Year"=1981:2010)

You should have used summarise instead of mutate when you're looking to summarise values across groups. It will give you a shortened tibble as an output, with only the groups and the summary figures present (fewer columns and rows).

mutate on the other hand aims to modify an existing tibble, keeping all rows and columns by default.

The order of your functions in the chains also needs to change.

df %>%
  filter(Year>1985) %>%
  group_by(species,site) %>% 
  summarise(nYear = length(unique(Year))) %>% # instead of mutate
  arrange(species, .by_group=TRUE) %>% 
ungroup()

First, group_by(species,site), not year, then summarise and arrange.

# A tibble: 5 × 3
  species site  nYear
    <int> <chr> <int>
1       1 A         5
2       2 B         5
3       3 C         5
4       4 D         5
5       5 E         5

CodePudding user response:

You can use distinct() on the filtered frame, and then count by your groups of interest:

distinct(Df %>% filter(Year>1985)) %>%
  count(Site, Species,name = "nYear")
  • Related