I'm learning how to use the R lapply()
function and am benchmarking it against other options, in generating a transition matrix.
When I use long numeric values to seq_along()
a data frame, lapply()
doesn't work. Or perhaps the issue resides in seq_along()
, not lapply()
. So for example if set up the dataTest data frame as shown below, where each numeric value in the ID column is only 1 digit long, then the reproducible code at the bottom works fine:
dataTest <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
)
Correct results:
> numTransit(dataTest, 1,3)
X00 X01 X02
X00 1 0 0
X01 0 0 1
X02 0 1 0
But if I replace the above ID column with the below 7 digit values it not longer works! I gives me only 0 values in the above transition matrix.
ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)
And here is the reproducible code using lapply()
/seq_along()
to test the above against:
# Function to set-up base transition matrix with all 0 values:
transMat <- function(x){
df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
row.names(df) <- unique(x$Flags)
names(df) <- unique(x$Flags)
return(df)
}
# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
df <- transMat(x)
lapply(seq_along(unique(x$ID)), function(i){
id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
column <- which(names(df) == id_from)
row <- which(row.names(df) == id_to)
df[row, column] <<- df[row, column] 1
})
return(df)
}
# Now to run the functions:
numTransit(dataTest,1,3)
If I replace the above lapply()
/seq_along()
with a for-loop, the code runs fine regardless of the length of the ID values. I can post the for-loop code if anyone likes, please let me know.
CodePudding user response:
The problem is not with lapply()
nor seq_along()
, but with the X
argument in lapply()
.
seq_along(x)
returns a vector from 1
to the number of elements in x
.
For example, if we have a vector that has three elements:
seq_along(c(534624, 56235, 62))
Returns:
[1] 1 2 3
Therefore, when you use x$ID == i
, it's matching the ID
column in x
that is 1
, 2
or 3
, which is definitely not your case.
So you need to use lapply(unique(x$ID), function(i) ...)
.
Here is the full code (I basically only changed your lapply()
part):
Input
dataTest <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
)
ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)
dataTest[, 1] <- ID
dataTest
ID Period Balance Flags
1 1930145 1 5 X00
2 1930145 2 10 X01
3 1930145 3 15 X00
4 1930146 1 0 X01
5 1930146 2 2 X02
6 1930146 3 4 X02
7 1930147 1 3 X02
8 1930147 2 6 X01
9 1930147 3 9 X01
output
transMat <- function(x){
df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
row.names(df) <- unique(x$Flags)
names(df) <- unique(x$Flags)
return(df)
}
# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
df <- transMat(x)
lapply(unique(x$ID), function(i){
id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
column <- which(names(df) == id_from)
row <- which(row.names(df) == id_to)
df[row, column] <<- df[row, column] 1
})
return(df)
}
numTransit(dataTest,1,3)
X00 X01 X02
X00 1 0 0
X01 0 0 1
X02 0 1 0