Why I don't need to create a SparkSession in Databricks? Is a SparkSession created automatically when the cluster is configured? Or somebodyelse did it for me?
CodePudding user response:
That is done only in the notebooks, to simplify user's work & avoiding them to specify different parameters, many of them won't have any effect because Spark is already started. This behavior is similar to what you get when you start spark-shell
or pyspark
- both of them initialize the SparkSession
and SparkContext
:
Spark context available as 'sc' (master = local[*], app id = local-1635579272032).
SparkSession available as 'spark'.
But if you're running code from jar or Python wheel as job, then it's your responsibility to create corresponding objects.
CodePudding user response:
In Databricks environment, Whereas in Spark 2.0 the same effects can be achieved through SparkSession, without expliciting creating SparkConf, SparkContext or SQLContext, as they’re encapsulated within the SparkSession. Using a builder design pattern, it instantiates a SparkSession object if one does not already exist, along with its associated underlying contexts.ref: link