The dataset is very large and needs to be executed with parallelization. The following is a synthetic dataset:
require(data.table)
require(furrr)
Names<-c("Estimate","Std.Error","t-value","Pr(>|t|)")
lm_summary<-function(Data){coef(summary(lm(Y~.,data =Data)))["X",]}
Synthetic_Data<-data.table(id=rep(seq(1,10000),each=1000),X=rnorm(1e6),Y=rnorm(1e6),key="id")
Synthetic_Data<-Synthetic_Data[,list(nested_DT=list(data.table(X,Y))),by="id"]
Ive tried this but it doesnt work.
plan(multisession,workers=6)
Synthetic_Data[,(Names):=future_map(nested_DT,lm_summary),.SDcols=Names]
It gives this error :: Supplied 4 columns to be assigned 10000 items. Please see NEWS for v1.12.2
However.This works perfectly fine
Synthetic_Data[,Model:=future_map(nested_DT,lm_summary)]
but instead of a Model object I need the Names columns appended to the data.table
CodePudding user response:
The error message comes because map
or lapply
output a nrow * 4
list instead of a 4 * nrow
list.
transpose
solves this and seems quite efficient, without need for futures
(data.table
has integrated multiprocessing abilities):
Synthetic_Data[,(Names):=transpose(lapply(nested_DT,lm_summary))]
CodePudding user response:
I have a solution but it is inelegant.
Synthetic_Dat<-cbind(Synthetic_Data,future_map_dfr(Synthetic_Data$nested_DT,lm_summary) %>% setDT(.))