Home > OS >  How to get the cummean of the last n rows before the last row using dplyr
How to get the cummean of the last n rows before the last row using dplyr

Time:06-12

If I have a data frame that I am going through using dplyr, how can I get the cummean of the last 2 rows before the last row?

For example...

 ------
   |value|
   -------
   | 1   |
   ------
   | 2   |
   -------
   | 3   |
   ------
   | 4   |
   -------

In this case, when the code goes through the value 4, I want the cummean to calculate (2 3)/2.

I am thinking something along the lines of this--

new_df -> df %>%
    summarise(
       roll_mean = cummean(value,-2)
    )

However I know this does not work

CodePudding user response:

Assuming that you mean is at each position calculate the mean of the prior two elements of value, use rollapply with a width of list(-seq(2)) which means that at each point use the offsets -1 and -2, i.e. one and two positions earlier.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

DF %>% mutate(mean = rollapply(value, list(-seq(2)), mean, fill = NA))
##   value mean
## 1     1   NA
## 2     2   NA
## 3     3  1.5
## 4     4  2.5

A variation is:

DF %>% mutate(mean = (rollsumr(value, 3, fill = NA) - value) / 2)

Note

Input in reproducible form:

DF <- data.frame(value = 1:4)

CodePudding user response:

[Using this data: df <- data.frame(value = 1:4)]

For two prior values, you could do it manually like so:

df %>% 
  mutate(roll_mean = (lag(value)   lag(value,2))/2)

#  value roll_mean
#1     1        NA
#2     2        NA
#3     3       1.5
#4     4       2.5

Or if the window should be bigger, you could use the difference of two cumsum values:

df %>% 
  mutate(cuml = cumsum(value),
         roll_mean = (lag(cuml) - lag(cuml,3,default=0))/2)


#  value cuml roll_mean
#1     1    1        NA
#2     2    3       0.5
#3     3    6       1.5
#4     4   10       2.5

Or I like slider:slide_* for window functions. Here we ask for the mean for a window that starts two elements prior (.before = 2) and ends one element prior (after = -1).

df %>%
  mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1))

#  value roll_mean
#1     1       NaN
#2     2       1.0
#3     3       1.5
#4     4       2.5

By default slider allows incomplete windows; if we want the same output we could use:

df %>%
  mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1, .complete = TRUE))

#  value roll_mean
#1     1        NA
#2     2        NA
#3     3       1.5
#4     4       2.5
  • Related