I've got a df that looks like:
df(site=c(A,B,C,D,E), species=c(1,2,3,4), Year=c(1980:2010).
I would like to calculate the number of different years that each species appear in each site, creating a new column called nYear, I've tried filtering by group and using mutate combined with ndistinct values but it is not quite working.
Here is part of the code I have been using:
Df1 <- Df %>%
filter(Year>1985)%>%
mutate(nYear = n_distinct(Year[Year %in% site]))%>%
group_by(Species,Site, Year) %>%
arrange(Species, .by_group=TRUE)
ungroup()
CodePudding user response:
The approach is good, a few things to correct.
First, let's make some reproducible data (your code gave errors).
df <- data.frame("site"=LETTERS[1:5], "species"=1:5, "Year"=1981:2010)
You should have used summarise
instead of mutate
when you're looking to summarise values across groups. It will give you a shortened tibble
as an output, with only the groups and the summary figures present (fewer columns and rows).
mutate
on the other hand aims to modify an existing tibble
, keeping all rows and columns by default.
The order of your functions in the chains also needs to change.
df %>%
filter(Year>1985) %>%
group_by(species,site) %>%
summarise(nYear = length(unique(Year))) %>% # instead of mutate
arrange(species, .by_group=TRUE) %>%
ungroup()
First, group_by(species,site)
, not year, then summarise
and arrange
.
# A tibble: 5 × 3
species site nYear
<int> <chr> <int>
1 1 A 5
2 2 B 5
3 3 C 5
4 4 D 5
5 5 E 5
CodePudding user response:
You can use distinct()
on the filtered frame, and then count by your groups of interest:
distinct(Df %>% filter(Year>1985)) %>%
count(Site, Species,name = "nYear")