Home > Software design >  Using databricks-connect debugging a notebook that runs another notebook
Using databricks-connect debugging a notebook that runs another notebook

Time:10-09

I am able to connect to the Azure Databricks cluster from my Linux Centos VM, using visual studio code.

Below code even works without any issue

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

print("Cluster access test - ",spark.range(100).count())

setting = spark.conf.get("spark.master")  # returns local[*]
if "local" in setting:
    from pyspark.dbutils import DBUtils
    dbutils = DBUtils().get_dbutils(spark)
else:
    print("Do nothing - dbutils should be available already")

out = dbutils.fs.ls('/FileStore/')
print(out)

I have a notebook in my local which run another notebook using %run path/anothernotebook.

Since the %run string is commented # python is not executing it.

So i tried to include the dbutils.notebook.run('pathofnotebook') but it errors out stating notebook

Exception has occurred: AttributeError
'SparkServiceClientDBUtils' object has no attribute 'notebook'

Is it possible to locally debug a notebook that invokes another notebook?

CodePudding user response:

It’s impossible - dbutils implementation included into Databricks Connect supports only ‘fs’ and ‘secrets’ subcommands (see docs).

Databricks Connect is designed to work with code developed locally, not with notebooks. If you can package content of that notebook as Python package, then you’ll able to debug it.

P.S. please take into account that dbutils.notebook.run executes notebook as a separate job, in contrast with %run

  • Related