I am pretty new to Spark and my question might be an absurd one.
We can create a dataframe using spark.createDataFrame(data) and can execute SQL commands using spark.sql('select 1') without even calling the SparkSession.builder.appName('SomeName').getOrCreate()
Then why do we need to call SparkSession.builder...?
CodePudding user response:
This behaviour is only in a Databricks notebook, so that the user can not override any parameters that already have been set, since Spark is already running on a cluster.
If you are running a .jar or Python wheel, you still need to create the SparkSession explicitly.