Home > database >  Why to use SparkSession in the beginning of Notebook
Why to use SparkSession in the beginning of Notebook

Time:08-16

I am pretty new to Spark and my question might be an absurd one.

We can create a dataframe using spark.createDataFrame(data) and can execute SQL commands using spark.sql('select 1') without even calling the SparkSession.builder.appName('SomeName').getOrCreate()

enter image description here

Then why do we need to call SparkSession.builder...? enter image description here

CodePudding user response:

This behaviour is only in a Databricks notebook, so that the user can not override any parameters that already have been set, since Spark is already running on a cluster.

If you are running a .jar or Python wheel, you still need to create the SparkSession explicitly.

  • Related