Home > Software design >  Scla/Java library not installing on execution of Databricks Notebook
Scla/Java library not installing on execution of Databricks Notebook

Time:11-27

At work I have a Scala Databricks Notebook that uses many libraries imports, both from Maven and from some JAR files. The issue I have is that when I plan jobs on this Notebook, it sometimes fails (completely randomly but mostly 1 time over 10 runs) because it executes the cells before all libraries are installed. Thus the job fails and I have to go launch it manually. Such comportment from this Databricks' product is far from being professional as I can't use it in production because it sometimes fails.

I tried to put a Thread.Sleep() of 1 minute or so before all my imports, but it does not change anything. For Python there's the dbutils.library.installPyPI("library-name") but there's no such thing for Scala in the Dbutils documentation.

So does anyone have had the same issue and if so, how did you solve it ?

Thank you !

CodePudding user response:

Simply put for prod scheduled jobs use New Job Cluster and avoid All Purpose Cluster.

New Job Clusters are dedicated clusters created and started when you run a task and terminate immediately after the task completes. In production, Databricks recommends using new clusters so that each task runs in a fully isolated environment.

In the UI, when setting up your notebook job select a New Job Cluster and afterwards add all the dependent libraries to the job.

The pricing is different for New Job Cluster. I would say it ends up cheaper.

Note: Use Databricks pools to reduce cluster start and auto-scaling times (if it's an issue to begin with).

  • Related