Home > Blockchain >  Proper use of seq_along() versus unique() functions within a looping function?
Proper use of seq_along() versus unique() functions within a looping function?

Time:03-10

I'm learning how to use the R lapply() function and am benchmarking it against other options, in generating a transition matrix.

When I use long numeric values to seq_along() a data frame, lapply() doesn't work. Or perhaps the issue resides in seq_along(), not lapply(). So for example if set up the dataTest data frame as shown below, where each numeric value in the ID column is only 1 digit long, then the reproducible code at the bottom works fine:

dataTest <- 
    data.frame(
      ID = c(1,1,1,2,2,2,3,3,3),
      Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
      Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
      Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
    )

Correct results:

> numTransit(dataTest, 1,3)
    X00 X01 X02
X00   1   0   0
X01   0   0   1
X02   0   1   0

But if I replace the above ID column with the below 7 digit values it not longer works! I gives me only 0 values in the above transition matrix.

ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)

And here is the reproducible code using lapply()/seq_along() to test the above against:

# Function to set-up base transition matrix with all 0 values:
  transMat <- function(x){
    df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
    row.names(df) <- unique(x$Flags)
    names(df) <- unique(x$Flags)
    return(df)
  }

# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
    df <- transMat(x)
    lapply(seq_along(unique(x$ID)), function(i){
      id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
      id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
      column <- which(names(df) == id_from)
      row <- which(row.names(df) == id_to)
      df[row, column] <<- df[row, column]   1
    })
    return(df)
  }

# Now to run the functions:
numTransit(dataTest,1,3)

If I replace the above lapply()/seq_along() with a for-loop, the code runs fine regardless of the length of the ID values. I can post the for-loop code if anyone likes, please let me know.

CodePudding user response:

The problem is not with lapply() nor seq_along(), but with the X argument in lapply().

seq_along(x) returns a vector from 1 to the number of elements in x.

For example, if we have a vector that has three elements:

seq_along(c(534624, 56235, 62))

Returns:

[1] 1 2 3

Therefore, when you use x$ID == i, it's matching the ID column in x that is 1, 2 or 3, which is definitely not your case.

So you need to use lapply(unique(x$ID), function(i) ...).

Here is the full code (I basically only changed your lapply() part):

Input

dataTest <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
  )

ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)

dataTest[, 1] <- ID

dataTest
       ID Period Balance Flags
1 1930145      1       5   X00
2 1930145      2      10   X01
3 1930145      3      15   X00
4 1930146      1       0   X01
5 1930146      2       2   X02
6 1930146      3       4   X02
7 1930147      1       3   X02
8 1930147      2       6   X01
9 1930147      3       9   X01

output

transMat <- function(x){
  df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
  row.names(df) <- unique(x$Flags)
  names(df) <- unique(x$Flags)
  return(df)
}

# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
  df <- transMat(x)
  lapply(unique(x$ID), function(i){
    id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
    id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
    column <- which(names(df) == id_from)
    row <- which(row.names(df) == id_to)
    df[row, column] <<- df[row, column]   1
  })
  return(df)
}

numTransit(dataTest,1,3)

    X00 X01 X02
X00   1   0   0
X01   0   0   1
X02   0   1   0
  • Related