I am trying to find the average rate of a certain virus between 2002-2021, 2002-2012, and 2002-2007 by another variable "jurisdiction". The code I have right now is:
avgrate20 <- ratesmerge %>%
group_by(Jurisdiction) %>%
summarize(
Years = paste(range(2002:2021), collapse = "-"),
across(starts_with("rate"), mean)
)
When I change Years = paste(range(2002:2021), collapse = "-")
to 2002-2012, it still takes the mean from 2002-2021.
Here is my output when doing head(df)
Any help would be appreciated
CodePudding user response:
Years = paste(range(yrs_wanted), collapse = "-")
simply creates a column called Years
containing the character vector "2002-2021"
-- this doesn't tell R anything about what rows to include in computing the mean. For that, you need to dplyr::filter()
.
library(dplyr)
yrs_wanted <- 2002:2021
avgrate20 <- ratesmerge %>%
filter(MMWR_YEAR %in% yrs_wanted) %>%
group_by(Jurisdiction) %>%
summarize(
Years = paste(range(yrs_wanted), collapse = "-"),
across(starts_with("rate"), mean)
)
If you want to get fancy, you can loop through your year ranges using purrr::map_dfr()
:
library(dplyr)
library(purrr)
year_ranges <- list(
2002:2021,
2002:2012,
2002:2007
)
avgrates <- map_dfr(
year_ranges,
~ ratesmerge %>%
filter(MMWR_YEAR %in% .x) %>%
group_by(Jurisdiction) %>%
summarize(
Years = paste(range(.x), collapse = "-"),
across(starts_with("rate"), mean)
)
)