I am unable to add too many columns inside a function. It's fine outside of the function.
g <- function(d,n=200){
#create n columns
for (ii in 1:n){
cname = paste0('x',ii)
d[,c(cname):= ii]
}
#remove n-5 columns
for (ii in 1:(n-5)){
cname = paste0('x',ii)
d[,c(cname):= NULL]
}
}
d = data.table(a=1:10)
g(d,1023)
print(d)
print(dim(d))
#[1] 10 6
d = data.table(a=1:10)
g(d,1025)
print(dim(d))
#[1] 10 1025
#R 4.0.2
#data.table 1.13.0
CodePudding user response:
Please take this to the data.table bug tracker. It looks like a bug related to over-allocation. The increase in truelength
needed by the second case results in a change of the memory address but that change isn't bound to the reference outside the function.
library(data.table) #version 1.14.2
g <- function(d,n=200){
#create n columns
for (ii in 1:n){
cname = paste0('x',ii)
d[,c(cname):= ii]
}
message("truelength is: ", truelength(d))
message("length is: ", length(d))
#remove n-5 columns
for (ii in 1:(n-5)){
cname = paste0('x',ii)
set(d, j = cname, value = NULL)
}
}
d = data.table(a=1:10)
truelength(d)
#[1] 1025
g(d,1023)
#truelength is: 1025
#length is: 1024
truelength(d)
#[1] 1025
length(d)
#[1] 6
d = data.table(a=1:10)
g(d,1025)
#truelength is: 2050
#length is: 1026
truelength(d)
#[1] 1025
length(d)
#[1] 1025
tail(names(d))
#[1] "x1019" "x1020" "x1021" "x1022" "x1023" "x1024"
#column x1025 has not been created at the original address
For now you can fix the issue by over-allocating more columns before you call g
:
d = data.table(a=1:10)
setalloccol(d, 2050)
g(d,1025)
#truelength is: 2051
#length is: 1026
print(dim(d))
#[1] 10 6
Or have the function return the data.table.