I have my main notebook in databricks that I am running my base set of code. Right now, I have to always pass "spark" and "dbutils" through my function to get the function to work properly.
Main notebook code:
from subfolder import awesome
awesome.somefunction(spark,dbutils,parameterC)
The code within the awesome.py file is the following: (this is located in a folder called "subfolder" which is one level deeper than the main notebook, it is also accompanied by an init python file)
def somefunction(spark,dbutils,parameterC):
# used spark in this function
# used dbutils in this function
# used parameterC in this function
# create a spark view at the end
# return None
return None
If I remove spark and dbutils from the function, I get the "spark" or "dbutils" module has not been found.
How can I get it so I don't have to automatically pass spark and dbutils to my .py file?
CodePudding user response:
On Databricks, spark
and dbutils
are automatically injected only into the main entrypoint - your notebook, but they aren't propagated to the Python modules. With spark
solution is easy, just use the getActiveSession
function of SparkSession
class (as SparkSession.getActiveSession()
), but you need to continue to pass dbutils
explicitly until you don't abstract getting dbutils
into some function, like, described in this answer.