Azure Databricks Execution Fail - CLOUD_PROVIDER_LAUNCH

I'm using Azure DataFactory for my data ingestion and using an Azure Databricks notebook through ADF's Notebook activity.

The Notebook uses an existing instance pool of Standard DS3_V2 (2-5 nodes autoscaled) with 7.3LTS Spark Runtime version. The same Azure subscription is used by multiple teams for their respective data pipelines.

During the ADF pipeline execution, I'm facing a notebook activity failure frequently with the below error message

{
  "reason": {
    "code": "CLOUD_PROVIDER_LAUNCH_FAILURE",
    "type": "CLOUD_FAILURE",
    "parameters": {
      "azure_error_code": "SubnetIsFull",
      "azure_error_message": "Subnet /subscriptions/<Subscription>/resourceGroups/<RG>/providers/Microsoft.Network/virtualNetworks/<VN>/subnets/<subnet> with address prefix 10.237.35.128/26 does not have enough capacity for 2 IP addresses."
    }
  }
}

Can anyone explain what this error is and how I can reduce the occurrence of this? (The documents I found are not explanatory)

CodePudding user response：

The problem arise from the fact that when your workspace was created, the network and subnet sizes wasn't planned correctly (see docs). As result, when you're trying to launch a cluster, then there is not enough IP addresses in a given subnet, and given this error.

Unfortunately right now it's not possible to expand network/subnets size, so if you need a bigger network, then you need to deploy a new workspace and migrate into it.

CodePudding user response：

Looks like your data bricks has been created within a VNET see this link or this link. When this is done, the databricks instances are created within one of the subnets within this VNET. It seems that at the point of triggering, all the IPs within the subnet were already utilized. You cannot ad should not extend the IP space. Please do not attempt to change the existing VNET configuration as this will affect your databricks cluster. You have the following options.

Check when less number of databricks instances are being instantiated and schedule your ADF during this time. You should be looking at distributing the execution across the time so we don't attempt to peak over the existing IPs in the subnet.
Request your IT department to create a new VNET and subnet and create a new Databricks cluster in this VNET.