Home > database >  How to modify this data.table code to show balance transitions instead of event frequency transition
How to modify this data.table code to show balance transitions instead of event frequency transition

Time:03-19

I am using the below MWE code to generate a data frame of transition frequencies. It works well and fast. I am new to the data.table package and am having trouble converting it to show balance transitions.

To start, below is the example data frame, the transition frequency output when running the functions (using the two time measurements of "Period_1" and "Period_2"), and the underlying MWE code for those functions, all of which work as intended for transition frequencies:

> data
   ID Period_1 Period_2 Values State
1:  1        1  2020-01      5    X0
2:  1        2  2020-02     10    X1
3:  1        3  2020-03     15    X2
4:  2        1  2020-04      0    X0
5:  2        2  2020-05      2    X2
6:  2        3  2020-06      4    X0
7:  3        1  2020-02      3    X2
8:  3        2  2020-03      6    X1
9:  3        3  2020-04      9    X0

> setDT(data)
> num_transit(data, "2020-02", "2020-04",refvar="Period_2")
   to_state X0 X1 X2
1:       X0 NA NA  1
2:       X1 NA NA NA
3:       X2 NA NA NA

> setDT(data)
> num_transit(data, 1,3, refvar="Period_1")
   to_state X0 X1 X2
1:       X0  1 NA  1
2:       X1 NA NA NA
3:       X2  1 NA NA

library(data.table)
   
data <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
    Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
  )
    
num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
  res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
  res <- res[, id:=1:.N, by=ID]
  res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
  setnames(res,c("from","to", "ct"))
  if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
  res
}
    
convert_transits_to_matrix <- function(transits,states) {
  m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
  m[as.matrix(transits[,.(to,from)])] <- transits$ct
  m = data.table(m)[,to_state:=rownames(m)]
  setcolorder(m,"to_state")
  return(m[])
}

Here is where I need help. I am trying to modify the above (call it "val_transit") to show the transition of the "Values" TO the new state. So the output would look like this, using the data dataframe and running Period_2 transitions from 1 to 3 (or val_transit(data, 1,3, refvar="Period_1")):

   to_state X0 X1 X2
1:       X0  4 NA  9
2:       X1 NA NA NA
3:       X2 15 NA NA

Any suggestions for doing this? This is a follow-on to transition frequency post How to create a table that measures transitions of elements over calendar periods?

CodePudding user response:

Sure, here is an update to the previous num_transit function. Notice the differences

  1. .SDcols included both State and Values in the first line of the function
  2. value.vars in the dcast call includes both State and Value
  3. As a result of (2) above, I group explicitly over State_1,State_2, instead of 1 and 2, and the summarizing action is to sum the Values
  4. The setnames call is adjusted to return the final column as Values if return_matrix=F
val_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
  res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State", "Values")]
  res <- res[, id:=1:.N, by=ID]
  res <- dcast(res, ID~id, value.var=c("State", "Values"))[,.(Values=sum(Values_2,na.rm=T)), .(State_1, State_2)]
  setnames(res,c("from","to", "Values"))
  if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
  res
}

Notice below that I made a minor update to my convert_transits_to_matrix function, so that this helper function works with both val_transit() and num_transit(). The minor update is in the 2nd line, where I use transits[[3]], so that it works regardless of the actual 3rd column name in transits object.

convert_transits_to_matrix <- function(transits,states) {
  m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
  m[as.matrix(transits[,.(to,from)])] <- transits[[3]]
  m = data.table(m)[,to_state:=rownames(m)]
  setcolorder(m,"to_state")
  return(m[])
}

Usage:

val_transit(data,"2020-02","2020-04", "Period_2")
   to_state    X0    X1    X2
     <char> <num> <num> <num>
1:       X0    NA    NA     9
2:       X1    NA    NA    NA
3:       X2    NA    NA    NA

val_transit(data,1,3, "Period_1")

   to_state    X0    X1    X2
     <char> <num> <num> <num>
1:       X0     4    NA     9
2:       X1    NA    NA    NA
3:       X2    15    NA    NA

Make sure your data is setDT(data) before feeding it to these functions.

  • Related