Home > Net >  Average rate for 20 years, 10 years, and 5 years in R
Average rate for 20 years, 10 years, and 5 years in R

Time:03-05

I am trying to find the average rate of a certain virus between 2002-2021, 2002-2012, and 2002-2007 by another variable "jurisdiction". The code I have right now is:

avgrate20 <- ratesmerge %>%
  group_by(Jurisdiction) %>%
  summarize(
    Years = paste(range(2002:2021), collapse = "-"),
    across(starts_with("rate"), mean)
  )

When I change Years = paste(range(2002:2021), collapse = "-") to 2002-2012, it still takes the mean from 2002-2021.

Here is my output when doing head(df) enter image description here

Any help would be appreciated

CodePudding user response:

Years = paste(range(yrs_wanted), collapse = "-") simply creates a column called Years containing the character vector "2002-2021" -- this doesn't tell R anything about what rows to include in computing the mean. For that, you need to dplyr::filter().

library(dplyr)

yrs_wanted <- 2002:2021

avgrate20 <- ratesmerge %>%
  filter(MMWR_YEAR %in% yrs_wanted) %>%
  group_by(Jurisdiction) %>%
  summarize(
    Years = paste(range(yrs_wanted), collapse = "-"),
    across(starts_with("rate"), mean)
  )

If you want to get fancy, you can loop through your year ranges using purrr::map_dfr():

library(dplyr)
library(purrr)

year_ranges <- list(
  2002:2021,
  2002:2012,
  2002:2007
)

avgrates <- map_dfr(
  year_ranges,
  ~ ratesmerge %>%
  filter(MMWR_YEAR %in% .x) %>%
  group_by(Jurisdiction) %>%
  summarize(
    Years = paste(range(.x), collapse = "-"),
    across(starts_with("rate"), mean)
  )
)
  • Related