Home > Software engineering >  Efficient data.table rowwise and insertion of new columns
Efficient data.table rowwise and insertion of new columns

Time:09-07

The dataset is very large and needs to be executed with parallelization. The following is a synthetic dataset:

require(data.table)
require(furrr)

Names<-c("Estimate","Std.Error","t-value","Pr(>|t|)") 
lm_summary<-function(Data){coef(summary(lm(Y~.,data =Data)))["X",]}
Synthetic_Data<-data.table(id=rep(seq(1,10000),each=1000),X=rnorm(1e6),Y=rnorm(1e6),key="id")
Synthetic_Data<-Synthetic_Data[,list(nested_DT=list(data.table(X,Y))),by="id"]

Ive tried this but it doesnt work.

plan(multisession,workers=6)
Synthetic_Data[,(Names):=future_map(nested_DT,lm_summary),.SDcols=Names]

It gives this error :: Supplied 4 columns to be assigned 10000 items. Please see NEWS for v1.12.2

However.This works perfectly fine

Synthetic_Data[,Model:=future_map(nested_DT,lm_summary)]

but instead of a Model object I need the Names columns appended to the data.table

CodePudding user response:

The error message comes because map or lapply output a nrow * 4 list instead of a 4 * nrow list.
transpose solves this and seems quite efficient, without need for futures (data.table has integrated multiprocessing abilities):

Synthetic_Data[,(Names):=transpose(lapply(nested_DT,lm_summary))]

CodePudding user response:

I have a solution but it is inelegant.

Synthetic_Dat<-cbind(Synthetic_Data,future_map_dfr(Synthetic_Data$nested_DT,lm_summary) %>% setDT(.))
  • Related