Home > Net >  Unable to create and remove too many columns inside a function
Unable to create and remove too many columns inside a function

Time:08-05

I am unable to add too many columns inside a function. It's fine outside of the function.

g <- function(d,n=200){
    
    #create n columns
    for (ii in 1:n){
        cname = paste0('x',ii)
        d[,c(cname):= ii]
    }
    
    #remove n-5 columns
    for (ii in 1:(n-5)){
        cname = paste0('x',ii)
        d[,c(cname):= NULL]
    }
    
}

d = data.table(a=1:10)
g(d,1023)
print(d)
print(dim(d))
#[1] 10  6

d = data.table(a=1:10)
g(d,1025)
print(dim(d))
#[1]   10 1025

#R 4.0.2
#data.table 1.13.0

CodePudding user response:

Please take this to the data.table bug tracker. It looks like a bug related to over-allocation. The increase in truelength needed by the second case results in a change of the memory address but that change isn't bound to the reference outside the function.

library(data.table) #version 1.14.2

g <- function(d,n=200){
  #create n columns
  for (ii in 1:n){
    cname = paste0('x',ii)
    d[,c(cname):= ii]
  }

  message("truelength is: ", truelength(d))
  message("length is: ", length(d))

  #remove n-5 columns
  for (ii in 1:(n-5)){
    cname = paste0('x',ii)
    set(d, j = cname, value = NULL)
  }
}

d = data.table(a=1:10)
truelength(d)
#[1] 1025
g(d,1023)
#truelength is: 1025
#length is: 1024

truelength(d)
#[1] 1025
length(d)
#[1] 6

d = data.table(a=1:10)
g(d,1025)
#truelength is: 2050
#length is: 1026
truelength(d)
#[1] 1025
length(d)
#[1] 1025

tail(names(d))
#[1] "x1019" "x1020" "x1021" "x1022" "x1023" "x1024"
#column x1025 has not been created at the original address

For now you can fix the issue by over-allocating more columns before you call g:

d = data.table(a=1:10)
setalloccol(d, 2050)
g(d,1025)
#truelength is: 2051
#length is: 1026
print(dim(d))
#[1] 10  6

Or have the function return the data.table.

  • Related