I am using the below MWE code to generate a data frame of transition frequencies. It works well and fast. I am new to the data.table
package and am having trouble converting it to show balance transitions.
To start, below is the example data frame, the transition frequency output when running the functions (using the two time measurements of "Period_1" and "Period_2"), and the underlying MWE code for those functions, all of which work as intended for transition frequencies:
> data
ID Period_1 Period_2 Values State
1: 1 1 2020-01 5 X0
2: 1 2 2020-02 10 X1
3: 1 3 2020-03 15 X2
4: 2 1 2020-04 0 X0
5: 2 2 2020-05 2 X2
6: 2 3 2020-06 4 X0
7: 3 1 2020-02 3 X2
8: 3 2 2020-03 6 X1
9: 3 3 2020-04 9 X0
> setDT(data)
> num_transit(data, "2020-02", "2020-04",refvar="Period_2")
to_state X0 X1 X2
1: X0 NA NA 1
2: X1 NA NA NA
3: X2 NA NA NA
> setDT(data)
> num_transit(data, 1,3, refvar="Period_1")
to_state X0 X1 X2
1: X0 1 NA 1
2: X1 NA NA NA
3: X2 1 NA NA
library(data.table)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
)
num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
res <- res[, id:=1:.N, by=ID]
res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
setnames(res,c("from","to", "ct"))
if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
res
}
convert_transits_to_matrix <- function(transits,states) {
m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
m[as.matrix(transits[,.(to,from)])] <- transits$ct
m = data.table(m)[,to_state:=rownames(m)]
setcolorder(m,"to_state")
return(m[])
}
Here is where I need help. I am trying to modify the above (call it "val_transit") to show the transition of the "Values" TO the new state. So the output would look like this, using the data
dataframe and running Period_2 transitions from 1 to 3 (or val_transit(data, 1,3, refvar="Period_1")
):
to_state X0 X1 X2
1: X0 4 NA 9
2: X1 NA NA NA
3: X2 15 NA NA
Any suggestions for doing this? This is a follow-on to transition frequency post How to create a table that measures transitions of elements over calendar periods?
CodePudding user response:
Sure, here is an update to the previous num_transit
function. Notice the differences
.SDcols
included bothState
andValues
in the first line of the functionvalue.vars
in thedcast
call includes bothState
andValue
- As a result of (2) above, I group explicitly over
State_1
,State_2
, instead of1
and2
, and the summarizing action is to sum theValues
- The
setnames
call is adjusted to return the final column asValues
ifreturn_matrix=F
val_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State", "Values")]
res <- res[, id:=1:.N, by=ID]
res <- dcast(res, ID~id, value.var=c("State", "Values"))[,.(Values=sum(Values_2,na.rm=T)), .(State_1, State_2)]
setnames(res,c("from","to", "Values"))
if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
res
}
Notice below that I made a minor update to my convert_transits_to_matrix
function, so that this helper function works with both val_transit()
and num_transit()
. The minor update is in the 2nd line, where I use transits[[3]]
, so that it works regardless of the actual 3rd column name in transits
object.
convert_transits_to_matrix <- function(transits,states) {
m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
m[as.matrix(transits[,.(to,from)])] <- transits[[3]]
m = data.table(m)[,to_state:=rownames(m)]
setcolorder(m,"to_state")
return(m[])
}
Usage:
val_transit(data,"2020-02","2020-04", "Period_2")
to_state X0 X1 X2
<char> <num> <num> <num>
1: X0 NA NA 9
2: X1 NA NA NA
3: X2 NA NA NA
val_transit(data,1,3, "Period_1")
to_state X0 X1 X2
<char> <num> <num> <num>
1: X0 4 NA 9
2: X1 NA NA NA
3: X2 15 NA NA
Make sure your data
is setDT(data) before feeding it to these functions.