I am working on a research project and a while back I asked this question on Mathematics Stack Exchange, where I was looking for a way to calculate the variance of the period-to-period change in income given a transition matrix, where each state corresponds to a log level of income in a vector, which is given. I want to calculate what the variance of an individual's change in income is over some n number of periods given that they began in each state. My state space consists of 11 states, so I hope to end up with a vector consisting of 11 different variances. When I asked the question, I received a satisfactory answer, but I am running into some issues when trying to code it in R I was hoping to receive help with.
I have created this piece of code to calculate the variances:
install.packages("expm")
library(expm)
# creating standard basis vectors
e <- function(i) {
e_i = rep(0, length(alpha))
e_i[i] = 1
return(e_i)
}
# compute variances
p2p_variance = function(n, alpha, P) {
variance = list()
pi_n = list()
for (i in 1:length(alpha)) {
pi_n[[i]] = e(i) %*% (P %^% n)
beta = (t(alpha) - t(alpha)[i])^2
variance[[i]] = (pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2)
}
return(t(variance))
}
And for my values of alpha (vector of log levels of income) and P (transition matrix) I use:
alpha = c(3.4965, 3.5835, 3.6636, 3.7377, 3.8067, 3.8712, 3.9318, 3.9890, 4.0431, 4.0943, 4.1431)
P = rbind(c(0.9004, 0.0734, 0.0203, 0.0043, 0.0010, 0.0003, 0.0001, 0.0001, 0.0000, 0.0000, 0.0000),
c(0.3359, 0.3498, 0.2401, 0.0589, 0.0115, 0.0026, 0.0007, 0.0003, 0.0001, 0.0001, 0.0000),
c(0.1583, 0.1538, 0.3931, 0.2346, 0.0481, 0.0090, 0.0021, 0.0007, 0.0003, 0.0001, 0.0001),
c(0.0746, 0.0609, 0.1600, 0.4368, 0.2178, 0.0397, 0.0073, 0.0019, 0.0006, 0.0002, 0.0001),
c(0.0349, 0.0271, 0.0559, 0.1724, 0.4628, 0.2031, 0.0344, 0.0067, 0.0018, 0.0006, 0.0003),
c(0.0155, 0.0122, 0.0230, 0.0537, 0.1817, 0.4870, 0.1860, 0.0316, 0.0066, 0.0018, 0.0009),
c(0.0066, 0.0054, 0.0100, 0.0204, 0.0529, 0.1956, 0.4925, 0.1772, 0.0307, 0.0064, 0.0023),
c(0.0025, 0.0023, 0.0043, 0.0084, 0.0186, 0.0530, 0.2025, 0.4980, 0.1760, 0.0275, 0.0067),
c(0.0009, 0.0009, 0.0017, 0.0035, 0.0072, 0.0168, 0.0490, 0.2025, 0.5194, 0.1721, 0.0260),
c(0.0003, 0.0003, 0.0007, 0.0013, 0.0029, 0.0061, 0.0142, 0.0430, 0.2023, 0.5485, 0.1804),
c(0.0001, 0.0001, 0.0002, 0.0003, 0.0008, 0.0017, 0.0032, 0.0068, 0.0212, 0.1079, 0.8578))
For instance, a call of p2p_variance(100, alpha, P)
(calculating the variance over 100 periods) results in the following vector of variances:
0.04393012 0.04091066 0.03856503 0.03636202 0.03472286 0.03331921 0.03213084 0.03068901 0.03143765 0.03255994 0.03522346
Which seem plausible. However, If I run p2p_variance(1000, alpha, P)
, it results in:
0.06126449 0.03445073 0.009621497 -0.01447615 -0.03652425 -0.05752316 -0.07753646 -0.09726683 -0.1134972 -0.1287498 -0.141676
This is obviously not correct, since we cannot have negative variance. I cannot figure out why simply increasing n to 1000 is resulting in negative variance here. I have most likely coded my p2p_variance function incorrectly, but I cannot for the life of me find the issue. Or perhaps is the process I am using to find these variances flawed somehow? I would really appreciate if anyone could look over this code and help me diagnose the issue
CodePudding user response:
Your variance function is returning the difference, and if you want the absolute value (variance) just wrap it inside abs()
like this:
p2p_variance = function(n, alpha, P) {
variance = list()
pi_n = list()
for (i in 1:length(alpha)) {
pi_n[[i]] = e(i) %*% (P %^% n)
beta = (t(alpha) - t(alpha)[i])^2
variance[[i]] = abs((pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2))
}
return(t(variance))
}
p2p_variance(1000, alpha, P)
Output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0.06126449 0.03445073 0.009621497 0.01447615 0.03652425 0.05752316 0.07753646 0.09726683 0.1134972 0.1287498 0.141676