Home > Back-end >  How to connect RStudio Cloud to Spark?
How to connect RStudio Cloud to Spark?

Time:03-13

I am using RStudio Cloud and I want to connect to Spark using sparklyr package. I tried a local master and a yarn master. The code is as below.

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
sc <- spark_connect(master = "yarn")
# Error in system2(file.path(spark_home, "bin", "spark-submit"), "--version", : error in running command

Neither worked. I don't know how to set up the Spark environments further. Any help would be much appreciated.

CodePudding user response:

This could be a problem with the version of Spark.

This works fine for me, on a new project on RStudio Cloud:

install.packages("sparklyr")
library(sparklyr)
spark_install(version = "3.0.0")
sc <- spark_connect(master = "local")

enter image description here

  • Related