I have a shared cluster which is used by more than several jobs on databricks. the update of the jar corresponding to the job is not used when I launch the execution of the job, on cluster, I see that it uses an old version of the jar.
to clarify, I publish the jar through API 2.0 in databricks.
my question why when i start the execution of my Job, the execution on the cluster always uses an old version. Thank you for you helping
CodePudding user response:
Old jar will be removed from the cluster only when it's terminated. If you have a shared cluster that never terminates, then it doesn't happen. This a limitation not of the Databricks but Java that can't unload classes that are already in use (or it's very hard to implement reliably).
For most of cases it's really not recommended to use shared cluster, for several reasons:
- it costs significantly more (~4x)
- tasks are affecting each other from performance point of view
- there is a high probability of dependencies conflicts inability of updating libraries without affecting other tasks
- there is a kind of "garbage" collected on the driver nodes
- ...
If you use shared cluster to get faster execution, I recommend to look onto Instance Pools, especially in combination of preloading of Databricks Runtime onto nodes in instance pool.