So I learn to calculate moving averages in R with this snippet of code
# just to create a distribution
x <- x <- rnorm(n=100, mean = 0, sd = 10)
mn <- function(n) rep (1/n,n)
filter(x, mn(5))
When a plot only the result of mn(5)
, i see that a get 1/5 repeated 5 times. Why using filter (x, mn(5))
calculates the average of the five values? Where are the part that the mean is calculated?
CodePudding user response:
1) The mean is the average of the values so assuming x has 5 elements we can write the second line and that is the same as the third line and the fourth line so using coefficients of (1/5, 1/5, 1/5, 1/5, 1/5) in the sum is equivalent to taking the mean.
mean(x)
= (x[1] x[2] x[3] x[4] x[5])/5
= x[1]/5 x[2]/5 x[3]/5 x[4]/5 x[5]/5
= sum(x * c(1/5, 1/5, 1/5, 1/5, 1/5))
2) Another way to understand this is to note that mean is linear. That is if x and y are two vectors of the same length then mean(x y) = mean(x) mean(y) and if a is any scalar then mean(a * x) = a * mean(x). Now it is known that any linear function that returns a scalar is representable as the inner product of some vector times the input. That is there is a vector v such that
mean(x)
sum(v * x)
are equal for all x. Now since it is true for all x it must be true for x <- c(1, 0, 0, 0, 0) so these are equal
mean(c(1, 0, 0, 0, 0)
v[1] * x[1]
but the second line equals v[1] since x[1] is 1 and the mean of c(1, 0, 0, 0, 0) in the first line equals 1/5 and similarly for
mean(c(0, 1, 0, 0, 0))
v[2] * x[2]
etc. so v must equal c(1/5, 1/5, 1/5, 1/5, 1/5).
CodePudding user response:
Where are the part that the mean is calculated?
See ?filter
for argument sides
. The default value sides = 2
means "center". You probably want sides = 1
to use past values only.
y <- filter(x, c(1/5, 1/5, 1/5, 1/5, 1/5), sides = 1)
is doing
y[1:4] = NA
y[5] = (x[1] x[2] x[3] x[4] x[5]) / 5 = mean(x[1:5])
y[6] = (x[2] x[3] x[4] x[5] x[6]) / 5 = mean(x[2:6])
...