Why I don't need to create a SparkSession in Databricks?-CodePudding

Why I don't need to create a SparkSession in Databricks? Is a SparkSession created automatically when the cluster is configured? Or somebodyelse did it for me?

CodePudding user response：

That is done only in the notebooks, to simplify user's work & avoiding them to specify different parameters, many of them won't have any effect because Spark is already started. This behavior is similar to what you get when you start spark-shell or pyspark - both of them initialize the SparkSession and SparkContext:

Spark context available as 'sc' (master = local[*], app id = local-1635579272032).
SparkSession available as 'spark'.

But if you're running code from jar or Python wheel as job, then it's your responsibility to create corresponding objects.

CodePudding user response：

In Databricks environment, Whereas in Spark 2.0 the same effects can be achieved through SparkSession, without expliciting creating SparkConf, SparkContext or SQLContext, as they’re encapsulated within the SparkSession. Using a builder design pattern, it instantiates a SparkSession object if one does not already exist, along with its associated underlying contexts.ref: link