I have this for(){} inside a function that read specific file columns in a folder. But as I have several files it is very slow.
How could I rewrite this in data.table format?
I use arrange(), because after I will bind this two df's by name. Name are equal in files, but not equally ordered in these. It's necessary bind columns class1 and class2 by name for this I use arrange().
for (i in 1:length(temp)) {
df1 <- read_table(temp[[i]],
col_types = "c________________f__",
col_names = c("name", "class1")) %>%
arrange(name)
df2 <- read_table(str_remove(temp[[i]], "_automat"),
col_types = "c________________f__",
col_names = c("name", "class2")) %>%
arrange(name)
}
CodePudding user response:
If you just want to convert this to data.tables, you can switch from read_table
to fread
, which is supposed to be faster and which generate a data.table which you can sort with [order(*)]
:
library(data.table)
fread(file=temp[[i]], select = c(name='character', class1='numeric'))[order(name)]
That might increase your speed some, but I think if you want more significant improvements, I'd look into replacing your for
loop with a parallel foreach
loop from the foreach
package. There are a number of questions talking about how to do that, but you might want to start here: run a for loop in parallel in R