I'm trying to get the values from a relatively simple averaging in R, but I'm having trouble finding the best solution.
To be clearer, I intend to create an additional column called TMA with mutate and inside it get the values of these averages for each row, as in the Excel image below:
CodePudding user response:
There are a variety of "rolling window" functions in various packages across R.
The following is based on {slider}
. If the syntax is not your style, look at {zoo}
and its rollmean()
function.
reproducible dataset/sample
You will find a lot of friends here, if you provide a reproducible data set.
I picked part of the data presented in your image. Think about doing this in the future.
I converted your month-year into a character of year-month(number). In R, I recommend you use proper Date or datetime objects for this. This can help with other transformations. ((But that is another discussion))
library(dplyr)
df <- tibble(
Data = c("2021-03", "2021-04","2021-05","2021-06","2021-07","2021-08")
,ANOM = c(-0.79,-0.72,-0.46,-0.28,-0.38,-0.45)
)
using {slider}
To understand what slider does print out the following:
library(slider)
> df %>% slide(~.x, .before = 2)
[[1]]
# A tibble: 1 x 2
Data ANOM
<chr> <dbl>
1 2021-03 -0.79
1 2021-03 -0.79
[[2]]
# A tibble: 2 x 2
Data ANOM
<chr> <dbl>
1 2021-03 -0.79
2 2021-04 -0.72
[[3]]
# A tibble: 3 x 2
Data ANOM
<chr> <dbl>
1 2021-03 -0.79
2 2021-04 -0.72
3 2021-05 -0.46
This builds a "sliding window" of size 3, i.e. the actual value and the 2 previous. Note that you may want to handle the partial windows differently. Check the documentation for it.
rolling mean for previous, current, and next row
Now let's use this for the desired sliding window along the ANOM
variable of your dataframe. As we know the output are doubles we use the slide_dbl()
function (otherwise you would get a list column).
The original question is about a rolling mean for the previous, current, and next row. Thus, we set the .before
and .after
appropriately in terms of steplength.
df %>%
mutate(TMA = slide_dbl(ANOM, mean, .before = 1, .after = 1))
# A tibble: 6 x 3
Data ANOM TMA
<chr> <dbl> <dbl>
1 2021-03 -0.79 -0.755
2 2021-04 -0.72 -0.657
3 2021-05 -0.46 -0.487
4 2021-06 -0.28 -0.373
5 2021-07 -0.38 -0.37
6 2021-08 -0.45 -0.415
CodePudding user response:
A slightly different and more verbose solution would be:
library(tidyverse)
df <- tibble(
Data = c("2021-03", "2021-04","2021-05","2021-06","2021-07","2021-08"),
ANOM = c(-0.79,-0.72,-0.46,-0.28,-0.38,-0.45))
df %>%
mutate(ANOM_lead = lead(ANOM),
ANOM_lag = lag(ANOM),
ANOM_avg = rowMeans(across(c(ANOM, ANOM_lead, ANOM_lag))))
# A tibble: 6 x 5
Data ANOM ANOM_lead ANOM_lag ANOM_avg
<chr> <dbl> <dbl> <dbl> <dbl>
1 2021-03 -0.79 -0.72 NA NA
2 2021-04 -0.72 -0.46 -0.79 -0.657
3 2021-05 -0.46 -0.28 -0.72 -0.487
4 2021-06 -0.28 -0.38 -0.46 -0.373
5 2021-07 -0.38 -0.45 -0.28 -0.37
6 2021-08 -0.45 NA -0.38 NA
Here, we first create the lead and lag variables for ANOM and then simply get the row mean. Whether you also want to calculate a mean for the first and teh last row is up to you, which you can control with the na.rm = TRUE
argument to the rowMeans
function. Of course, you can also omit the lead and lag variables afterwards if you want with a select
command.