How to create Transition Matrix (Markov) in R?-CodePudding

I've read the similar questions out there but I was not able to get the code right, so if anyone can help it would be very good.

I have a financial dataset similar to this:

Year  ID  Return  Quintile
1      A   -0,3      1
2      A   -0,2      2
3      B    1,5      5
3      C    0,1      3
4      C    0,1      3

For each year, I have information regarding many investors' IDs. For some ID's I have 20 years of data, for others, I only have 2 or 3 years of data. I was able to rank the data in performance quintiles (based on returns) from 1 to 5. I need to do a probability transition matrix.
Basically, I need to know the probability that an ID that was in quintile 1 remains on quintile 1 in the next year, or moves into quintile 2, 3, 4 or 5. I also need to do the same for a 2 years horizon.
.

Should be something like this (ifirst and second tables, and instead of A, B, C should have 1, 2 , 3 etc)

Text

Ps. Do I need to balance the data frame? I.e. only analyze ID's that have, let's say, 10 years of consecutive data? Otherwise, an ID that only has two years of returns, and changes its quintile, has a 100% probability of changing state and may disturb the end result.

Thanks a lot, and sorry for the long post.

CodePudding user response：

dat1 <- read.table(text="
Year  ID  Return  Quintile
1      A   -0,3      1
2      A   -0,2      2
3      B    1,5      5
3      C    0,1      3
4      C    0,1      3
", header=TRUE)

You can use statetable.msm from the msm package to count the transitions in your dataset, stratified by ID:

msm::statetable.msm(Quintile, ID, data=dat1)

    to
from 1 2 3 5
   1 0 1 0 0
   3 0 0 1 0

This will give you counts, but you could use prop.table to give you row percentages. This will rely on you having consecutive years for each subject though.

With this tiny amount of data it's not possible, but to estimate the tranistion matrix properly you could use msm function as follows:

# set allowed transitions
qm <- matrix(1,nrow=5, ncol=5)
# estimate the matrix
q1 <- msm::msm(Quintile ~ Year , subject = ID, data=dat1, qmatrix = qm)

You could then take this matrix and calculate the transition matrix for any given time period. But this doesn't work with only two transitions observed. Look at the documentation for msm to understand how it works.