Home > Mobile >  Why I don't need to create a SparkSession in Databricks?
Why I don't need to create a SparkSession in Databricks?

Time:10-30

Why I don't need to create a SparkSession in Databricks? Is a SparkSession created automatically when the cluster is configured? Or somebodyelse did it for me?

CodePudding user response:

That is done only in the notebooks, to simplify user's work & avoiding them to specify different parameters, many of them won't have any effect because Spark is already started. This behavior is similar to what you get when you start spark-shell or pyspark - both of them initialize the SparkSession and SparkContext:

Spark context available as 'sc' (master = local[*], app id = local-1635579272032).
SparkSession available as 'spark'.

But if you're running code from jar or Python wheel as job, then it's your responsibility to create corresponding objects.

CodePudding user response:

In Databricks environment, Whereas in Spark 2.0 the same effects can be achieved through SparkSession, without expliciting creating SparkConf, SparkContext or SQLContext, as they’re encapsulated within the SparkSession. Using a builder design pattern, it instantiates a SparkSession object if one does not already exist, along with its associated underlying contexts.ref: link

  • Related