I'm new here, so maybe my question could be difficult to understand. So, I have some data and it's date information and I need to group the mean of the data in year ranges. But this year ranges are non-ecluding, I mean that, for example, my first range is: 2013-2015 then 2014-2016 then 2015-2017, etc. So I think that it could be done by using a loop function and dplyr, but I dont know how to do it. I´ll be very thankfull if someone can help me.
Thank you,
Alejandro
What I tried was like:
for (i in Year){
Year_3=c(i, i 1, i 2)
db>%> group_by(Year_3)
#....etc
}
CodePudding user response:
As you note, each observation would be used in multiple groups, so one approach could be to make copies of your data accordingly:
df <- data.frame(year = 2013:2020, value = 1:8)
library(dplyr)
df %>%
tidyr::uncount(3, .id = "grp") %>%
mutate(group_start = year - grp 1,
group_name = paste0(group_start, "-", group_start 2)) %>%
group_by(group_name) %>%
summarise(value = mean(value),
n = n())
# A tibble: 10 × 3
group_name value n
<chr> <dbl> <int>
1 2011-2013 1 1
2 2012-2014 1.5 2
3 2013-2015 2 3
4 2014-2016 3 3
5 2015-2017 4 3
6 2016-2018 5 3
7 2017-2019 6 3
8 2018-2020 7 3
9 2019-2021 7.5 2
10 2020-2022 8 1
Or we might take a more algebraic approach, noting that the sum of a three year period will be the difference between the cumulative amount two years in the future minus the cumulative amount the prior year. This approach excludes the partial ranges.
df %>%
mutate(cuml = cumsum(value),
value_3yr = (lead(cuml, n = 2) - lag(cuml, default = 0)) / 3)
year value cuml value_3yr
1 2013 1 1 2
2 2014 2 3 3
3 2015 3 6 4
4 2016 4 10 5
5 2017 5 15 6
6 2018 6 21 7
7 2019 7 28 NA
8 2020 8 36 NA