Home > Blockchain >  Why does this lapply() function applied to a dataframe not produce the same results as its for-loop
Why does this lapply() function applied to a dataframe not produce the same results as its for-loop

Time:03-11

In the below reproducible code, the custom balTransit() function correctly populates a values transition table using a for-loop, while the custom balTransit_1() function is supposed to do the same using lapply() but it doesn't work. What am I doing wrong in my implementation of lapply()? Run the code and you'll see results of:

balTransit (correct results):

> test
   X1 X0 X2
X1  0  0  3
X0  0 50  0
X2  5  0  0 

balTransit_1 (incorrect, all 0's):

> test_1
   X1 X0 X2
X1  0  0  0
X0  0  0  0
X2  0  0  0

Enhanced explanation:

My main objective here is to learn how to use the apply() family of functions, for their perceived benefits. I’ve been going through simple tutorials. A secondary objective is the generation of a transition matrix from a base data frame. Once I figure this out with lapply() (or another apply() function that is most suitable), I’m going to run the various options (for-loop(), data.table(), lapply(), etc.) against the actual data set of 2.5m rows for speed testing.

What I’m doing is creating a transition matrix (technically here a data frame) showing the flow of values (balances) from one “Flags” category to another “Flags” category, over the periods specified by the user. So, in my “for-loop” reproducible example which works correctly, the user has specified a “From” period of 1 and a “To” period of 3. The transition matrix is then generated as shown in the image now posted at the bottom.

A related post yesterday, enter image description here

CodePudding user response:

To make your lapply code work just replace <- with <<-:

DF_1[row, column] <<- DF_1[row, column]   val

Please see ?assignOps for more info.

However, again I wouldn't recommend lapply in this case (<<- should be avoided in general)

Here is a data.table approach:

library(data.table)

DT <- setDT(data.frame(
  ID = c(1,1,1,2,2,2,3,3,3),
  Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
  Values = c(5, 10, 15, 50, 2, 4, 3, 6, 9),
  Flags = c("X1","X0","X2","X0","X2","X0", "X2","X1","X1")
))

unique_flags <- unique(DT$Flags)
all_flags <- setDT(expand.grid(list(first_flag = unique_flags, last_flag = unique_flags)))

resultDT <- dcast(
  data = DT[, .(first_flag = first(Flags), last_flag = last(Flags), first_value = first(Values)), by = ID][
  all_flags, on = c("first_flag", "last_flag")],
  last_flag ~ first_flag,
  fun.aggregate = sum,
  value.var = "first_value"
  )

for (col_i in seq_len(ncol(resultDT))){
  set(resultDT, which(is.na(resultDT[[col_i]])), col_i, 0)
}
print(resultDT)

Result:

   last_flag X0 X1 X2
1:        X0 50  0  0
2:        X1  0  0  3
3:        X2  0  5  0

# step by step ------------------------------------------------------------
library(data.table)

DT <- setDT(data.frame(
  ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
  Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
  Values = c(5, 10, 15, 50, 2, 4, 3, 6, 9, 3, 6, 9),
  Flags = c("X1","X0","X2","X0","X2","X0", "X2","X1","X1", "X2","X1","X1")
))

unique_flags <- unique(DT$Flags)
all_flags <- setDT(expand.grid(list(first_flag = unique_flags, last_flag = unique_flags)))

resultDT <- DT[, .(first_flag = first(Flags), last_flag = last(Flags), first_value = first(Values)), by = ID] # find relevant flags
resultDT <- resultDT[all_flags, on = c("first_flag", "last_flag")] # merge all combinations
resultDT <- dcast(resultDT, last_flag ~ first_flag, fun.aggregate = sum, value.var = "first_value") # dcast
for (col_i in seq_len(ncol(resultDT))){
  set(resultDT, which(is.na(resultDT[[col_i]])), col_i, 0)
}
print(resultDT)
  • Related