I am using RStudio Cloud and I want to connect to Spark using sparklyr
package. I tried a local master and a yarn
master. The code is as below.
library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
sc <- spark_connect(master = "yarn")
# Error in system2(file.path(spark_home, "bin", "spark-submit"), "--version", : error in running command
Neither worked. I don't know how to set up the Spark environments further. Any help would be much appreciated.
CodePudding user response:
This could be a problem with the version of Spark.
This works fine for me, on a new project on RStudio Cloud:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "3.0.0")
sc <- spark_connect(master = "local")