Home > Software design >  Implementing a function to calculate variance of period-to-period change of markov chain
Implementing a function to calculate variance of period-to-period change of markov chain

Time:03-22

I am working on a research project and a while back I asked this question on Mathematics Stack Exchange, where I was looking for a way to calculate the variance of the period-to-period change in income given a transition matrix, where each state corresponds to a log level of income in a vector, which is given. I want to calculate what the variance of an individual's change in income is over some n number of periods given that they began in each state. My state space consists of 11 states, so I hope to end up with a vector consisting of 11 different variances. When I asked the question, I received a satisfactory answer, but I am running into some issues when trying to code it in R I was hoping to receive help with.

I have created this piece of code to calculate the variances:

install.packages("expm")
library(expm)

# creating standard basis vectors
e <- function(i) {
  e_i = rep(0, length(alpha))
  e_i[i] = 1
  return(e_i)
}

# compute variances
p2p_variance = function(n, alpha, P) {
  variance = list()
  pi_n = list()
  for (i in 1:length(alpha)) {
    pi_n[[i]] = e(i) %*% (P %^% n)
    beta = (t(alpha) - t(alpha)[i])^2
    variance[[i]] = (pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2)
  }
  return(t(variance))
}

And for my values of alpha (vector of log levels of income) and P (transition matrix) I use:

alpha = c(3.4965, 3.5835, 3.6636, 3.7377, 3.8067, 3.8712, 3.9318,  3.9890, 4.0431, 4.0943, 4.1431)
P = rbind(c(0.9004, 0.0734, 0.0203, 0.0043, 0.0010, 0.0003, 0.0001, 0.0001, 0.0000, 0.0000, 0.0000),
          c(0.3359, 0.3498, 0.2401, 0.0589, 0.0115, 0.0026, 0.0007, 0.0003, 0.0001, 0.0001, 0.0000),
          c(0.1583, 0.1538, 0.3931, 0.2346, 0.0481, 0.0090, 0.0021, 0.0007, 0.0003, 0.0001, 0.0001),
          c(0.0746, 0.0609, 0.1600, 0.4368, 0.2178, 0.0397, 0.0073, 0.0019, 0.0006, 0.0002, 0.0001),
          c(0.0349, 0.0271, 0.0559, 0.1724, 0.4628, 0.2031, 0.0344, 0.0067, 0.0018, 0.0006, 0.0003),
          c(0.0155, 0.0122, 0.0230, 0.0537, 0.1817, 0.4870, 0.1860, 0.0316, 0.0066, 0.0018, 0.0009),
          c(0.0066, 0.0054, 0.0100, 0.0204, 0.0529, 0.1956, 0.4925, 0.1772, 0.0307, 0.0064, 0.0023),
          c(0.0025, 0.0023, 0.0043, 0.0084, 0.0186, 0.0530, 0.2025, 0.4980, 0.1760, 0.0275, 0.0067),
          c(0.0009, 0.0009, 0.0017, 0.0035, 0.0072, 0.0168, 0.0490, 0.2025, 0.5194, 0.1721, 0.0260),
          c(0.0003, 0.0003, 0.0007, 0.0013, 0.0029, 0.0061, 0.0142, 0.0430, 0.2023, 0.5485, 0.1804),
          c(0.0001, 0.0001, 0.0002, 0.0003, 0.0008, 0.0017, 0.0032, 0.0068, 0.0212, 0.1079, 0.8578))

For instance, a call of p2p_variance(100, alpha, P) (calculating the variance over 100 periods) results in the following vector of variances:

0.04393012 0.04091066 0.03856503 0.03636202 0.03472286 0.03331921 0.03213084 0.03068901 0.03143765 0.03255994 0.03522346

Which seem plausible. However, If I run p2p_variance(1000, alpha, P), it results in:

0.06126449 0.03445073 0.009621497 -0.01447615 -0.03652425 -0.05752316 -0.07753646 -0.09726683 -0.1134972 -0.1287498 -0.141676

This is obviously not correct, since we cannot have negative variance. I cannot figure out why simply increasing n to 1000 is resulting in negative variance here. I have most likely coded my p2p_variance function incorrectly, but I cannot for the life of me find the issue. Or perhaps is the process I am using to find these variances flawed somehow? I would really appreciate if anyone could look over this code and help me diagnose the issue

CodePudding user response:

Your variance function is returning the difference, and if you want the absolute value (variance) just wrap it inside abs() like this:

p2p_variance = function(n, alpha, P) {
  variance = list()
  pi_n = list()
  for (i in 1:length(alpha)) {
    pi_n[[i]] = e(i) %*% (P %^% n)
    beta = (t(alpha) - t(alpha)[i])^2
    variance[[i]] = abs((pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2))
  }
  return(t(variance))
}
p2p_variance(1000, alpha, P)

Output:

     [,1]       [,2]       [,3]        [,4]       [,5]       [,6]       [,7]       [,8]       [,9]      [,10]     [,11]   
[1,] 0.06126449 0.03445073 0.009621497 0.01447615 0.03652425 0.05752316 0.07753646 0.09726683 0.1134972 0.1287498 0.141676
  • Related