I need to calculate the running average by year for each column. My data example:
dat <- data.frame(yr = c(1980, 1980, 1980, 1980, 1980, 1981, 1981, 1981, 1981, 1981, 1982, 1982, 1982, 1982, 1982), data1 = c(-10.16, -7.48, -3.31, -6.04, -11.68, -13.40, -10.41, -10.65, -6.70, -17.05, -25.62, -29.14, -16.65, -6.42, 0.28), data2 = c(2.30, -7.52, -13.26, -13.24, -14.74, -9.38, -8.93, -11.78, -14.07, -11.66, -8.82, -10.30, -7.99, -10.02, -15.36), data3 = c(-14.83, -15.08, -16.44, -18.95, -13.40, -7.16, -4.35, -1.61, -0.01, -0.35, -2.09, -3.12, 0.87, -0.06, 2.29))
The running average is calculated as follows:
library(TTR)
library(dplyr)
runavg <- with(dat, ave(data1, yr, FUN=function(x)
TTR::runMean(x, n=3)) )
The question is simple, but I would like to ask you to explain in more detail how to apply this code to every column? I tried various solutions using lapply, function, mutate... I just realized that I do not have enough knowledge in this area :(. I will be very grateful for the help.
CodePudding user response:
We may use across
in tidyverse
to loop across all the 'data' columns and creating new columns by applying the runMean
function
library(dplyr)
library(TTR)
dat <- dat %>%
group_by(yr) %>%
mutate(across(starts_with('data'),
~ runMean(., n = 3), .names = 'runavg_{.col}'))
-output
dat
# A tibble: 15 × 7
# Groups: yr [3]
yr data1 data2 data3 runavg_data1 runavg_data2 runavg_data3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1980 -10.2 2.3 -14.8 NA NA NA
2 1980 -7.48 -7.52 -15.1 NA NA NA
3 1980 -3.31 -13.3 -16.4 -6.98 -6.16 -15.4
4 1980 -6.04 -13.2 -19.0 -5.61 -11.3 -16.8
5 1980 -11.7 -14.7 -13.4 -7.01 -13.7 -16.3
6 1981 -13.4 -9.38 -7.16 NA NA NA
7 1981 -10.4 -8.93 -4.35 NA NA NA
8 1981 -10.6 -11.8 -1.61 -11.5 -10.0 -4.37
9 1981 -6.7 -14.1 -0.01 -9.25 -11.6 -1.99
10 1981 -17.0 -11.7 -0.35 -11.5 -12.5 -0.657
11 1982 -25.6 -8.82 -2.09 NA NA NA
12 1982 -29.1 -10.3 -3.12 NA NA NA
13 1982 -16.6 -7.99 0.87 -23.8 -9.04 -1.45
14 1982 -6.42 -10.0 -0.06 -17.4 -9.44 -0.77
15 1982 0.28 -15.4 2.29 -7.60 -11.1 1.03
The similar option in base R
is lapply
dat[paste0('runavg_', names(dat)[-1])] <- lapply(dat[-1], function(x) ave(x, dat$yr, FUN = function(u) runMean(u, n = 3)))
CodePudding user response:
1) This uses rollmeanr from zoo and also uses by from base R. rollmeanr can handle multiple columns at once providing a bit of simplification. This splits up dat[-1] by dat[[1]], i.e. by year, and then applies rollmeanr to each component. Then we rbind the components back together and cbind the original dat to it.
library(zoo)
avg <- do.call("rbind", by(dat[-1], dat[[1]], rollmeanr, 3, fill = NA))
cbind(dat, avg = avg)
giving:
yr data1 data2 data3 avg.data1 avg.data2 avg.data3
1 1980 -10.16 2.30 -14.83 NA NA NA
2 1980 -7.48 -7.52 -15.08 NA NA NA
3 1980 -3.31 -13.26 -16.44 -6.983333 -6.160000 -15.4500000
4 1980 -6.04 -13.24 -18.95 -5.610000 -11.340000 -16.8233333
5 1980 -11.68 -14.74 -13.40 -7.010000 -13.746667 -16.2633333
6 1981 -13.40 -9.38 -7.16 NA NA NA
7 1981 -10.41 -8.93 -4.35 NA NA NA
8 1981 -10.65 -11.78 -1.61 -11.486667 -10.030000 -4.3733333
9 1981 -6.70 -14.07 -0.01 -9.253333 -11.593333 -1.9900000
10 1981 -17.05 -11.66 -0.35 -11.466667 -12.503333 -0.6566667
11 1982 -25.62 -8.82 -2.09 NA NA NA
12 1982 -29.14 -10.30 -3.12 NA NA NA
13 1982 -16.65 -7.99 0.87 -23.803333 -9.036667 -1.4466667
14 1982 -6.42 -10.02 -0.06 -17.403333 -9.436667 -0.7700000
15 1982 0.28 -15.36 2.29 -7.596667 -11.123333 1.0333333
1a) Another possibility is to define your own Mean function which takes the mean if the years are all the same and returns NA otherwise. That can be used with rollapplyr.
library(zoo)
Mean <- function(x) if (var(x[, 1]) == 0) colMeans(x[, -1]) else NA
data.frame(dat, avg = rollapplyr(dat, 3, Mean, fill = NA, by.column = FALSE))
2) Using dplyr and zoo we can use group_by and group_modify like this. For each year the group_modify function runs rollmeanr.
library(dplyr, exclude = c("lag", "filter"))
library(zoo)
dat %>%
group_by(yr) %>%
group_modify(~ data.frame(., avg = rollmeanr(., 3, fill = NA)) ) %>%
ungroup