I am using temporarily a Mac with R but it runs only with one core. I would like to run functions with big data using multiple core.
I have first tried this before running the function but it did not work:
library(doParallel)
library(foreach)
cluster = makeCluster(detectCores())
registerDoParallel(cluster)
is there a way to set R to work with multiple cores for all the analyses and functions you use?
CodePudding user response:
Answer: No. It's not possible to magically make R run in parallel everywhere.
To run code in parallel, you have to orchestrate the parallelization yourself using some of the parallel-processing frameworks, cf. https://cran.r-project.org/web/views/HighPerformanceComputing.html.
Some packages provide functions that can run in parallel. For many of them, parallelization can be controlled via an argument, and some via an R option, or another setting. You need to read the documentation for those functions to find out exactly how.
CodePudding user response:
Unfortunately, that's not how parallel computation works in R. R is not intrinsically multithreaded; in general, the functions from packages like parallel
and foreach
are used within a particular workflow or set of functions to specify that particular chunks of the computation should be run in parallel. If you're writing your own workflow, you can read (e.g.) the "using foreach" vignette that comes with the package to see how to make R compute in parallel. You would typically use foreach
(or tools from one of the other available parallel-computation-providing packages, e.g. parallel
, future
, or furrr
) to tell R to run chunks of your workflow in parallel.
As @HenrikB and @jared_mamrot say, some packages/functions have the possibility of parallelization built into them, e.g the vroom
package for loading data will automatically use all of the cores on your machine (this is not terribly well documented, e.g. see here); if you're using data.table
, you can get it to use multiple cores (although the details may be complex depending on your OS/chipset).
The other level at which R can relatively easily do parallel computation is if the linear algebra libraries (BLAS/LAPACK) are configured for multithreaded computation; e.g. see here. If your workflow uses a lot of (the right kind of) linear algebra, this can make a lot of difference.