How to calculate the average of the value of the previous row the value of the current row the v-CodePudding

I'm trying to get the values from a relatively simple averaging in R, but I'm having trouble finding the best solution.

To be clearer, I intend to create an additional column called TMA with mutate and inside it get the values of these averages for each row, as in the Excel image below:

Excel easy way example

CodePudding user response：

There are a variety of "rolling window" functions in various packages across R. The following is based on {slider}. If the syntax is not your style, look at {zoo} and its rollmean() function.

reproducible dataset/sample
You will find a lot of friends here, if you provide a reproducible data set. I picked part of the data presented in your image. Think about doing this in the future.

I converted your month-year into a character of year-month(number). In R, I recommend you use proper Date or datetime objects for this. This can help with other transformations. ((But that is another discussion))

library(dplyr)

df <- tibble(
    Data = c("2021-03", "2021-04","2021-05","2021-06","2021-07","2021-08")
    ,ANOM = c(-0.79,-0.72,-0.46,-0.28,-0.38,-0.45)
)

using {slider}
To understand what slider does print out the following:

library(slider)

> df %>% slide(~.x, .before = 2)
[[1]]
# A tibble: 1 x 2
  Data     ANOM
  <chr>   <dbl>
1 2021-03 -0.79
1 2021-03 -0.79

[[2]]
# A tibble: 2 x 2
  Data     ANOM
  <chr>   <dbl>
1 2021-03 -0.79
2 2021-04 -0.72

[[3]]
# A tibble: 3 x 2
  Data     ANOM
  <chr>   <dbl>
1 2021-03 -0.79
2 2021-04 -0.72
3 2021-05 -0.46

This builds a "sliding window" of size 3, i.e. the actual value and the 2 previous. Note that you may want to handle the partial windows differently. Check the documentation for it.

rolling mean for previous, current, and next row
Now let's use this for the desired sliding window along the ANOM variable of your dataframe. As we know the output are doubles we use the slide_dbl() function (otherwise you would get a list column).
The original question is about a rolling mean for the previous, current, and next row. Thus, we set the .before and .after appropriately in terms of steplength.

df %>% 
  mutate(TMA = slide_dbl(ANOM, mean, .before = 1, .after = 1))

# A tibble: 6 x 3
  Data     ANOM    TMA
  <chr>   <dbl>  <dbl>
1 2021-03 -0.79 -0.755
2 2021-04 -0.72 -0.657
3 2021-05 -0.46 -0.487
4 2021-06 -0.28 -0.373
5 2021-07 -0.38 -0.37 
6 2021-08 -0.45 -0.415

CodePudding user response：

A slightly different and more verbose solution would be:

library(tidyverse)

df <- tibble(
    Data = c("2021-03", "2021-04","2021-05","2021-06","2021-07","2021-08"),
    ANOM = c(-0.79,-0.72,-0.46,-0.28,-0.38,-0.45))

df %>%
  mutate(ANOM_lead = lead(ANOM),
         ANOM_lag  = lag(ANOM),
         ANOM_avg  = rowMeans(across(c(ANOM, ANOM_lead, ANOM_lag))))

# A tibble: 6 x 5
  Data     ANOM ANOM_lead ANOM_lag ANOM_avg
  <chr>   <dbl>     <dbl>    <dbl>    <dbl>
1 2021-03 -0.79     -0.72    NA      NA    
2 2021-04 -0.72     -0.46    -0.79   -0.657
3 2021-05 -0.46     -0.28    -0.72   -0.487
4 2021-06 -0.28     -0.38    -0.46   -0.373
5 2021-07 -0.38     -0.45    -0.28   -0.37 
6 2021-08 -0.45     NA       -0.38   NA

Here, we first create the lead and lag variables for ANOM and then simply get the row mean. Whether you also want to calculate a mean for the first and teh last row is up to you, which you can control with the na.rm = TRUE argument to the rowMeans function. Of course, you can also omit the lead and lag variables afterwards if you want with a select command.