Home > database >  How to automatically pass spark and dbutils to .py file in Databricks?
How to automatically pass spark and dbutils to .py file in Databricks?

Time:12-20

I have my main notebook in databricks that I am running my base set of code. Right now, I have to always pass "spark" and "dbutils" through my function to get the function to work properly.

Main notebook code:

from subfolder import awesome

awesome.somefunction(spark,dbutils,parameterC)

The code within the awesome.py file is the following: (this is located in a folder called "subfolder" which is one level deeper than the main notebook, it is also accompanied by an init python file)

def somefunction(spark,dbutils,parameterC):
    # used spark in this function
    # used dbutils in this function
    # used parameterC in this function
    # create a spark view at the end
    # return None
    return None

If I remove spark and dbutils from the function, I get the "spark" or "dbutils" module has not been found.

How can I get it so I don't have to automatically pass spark and dbutils to my .py file?

CodePudding user response:

On Databricks, spark and dbutils are automatically injected only into the main entrypoint - your notebook, but they aren't propagated to the Python modules. With spark solution is easy, just use the getActiveSession function of SparkSession class (as SparkSession.getActiveSession()), but you need to continue to pass dbutils explicitly until you don't abstract getting dbutils into some function, like, described in this answer.

  • Related