Home > Blockchain >  Create Azure Key Vault backed secret scope in Databricks with AAD Token
Create Azure Key Vault backed secret scope in Databricks with AAD Token

Time:03-10

My ultimate goal is to mount ADLS gen2 containers into my Databricks workspace as part of my Terraform-managed deployment under the auspices of an Azure Service Principal. This is a single deployment that creates all the Azure resources (networking, firewall, storage accounts, Databricks workspaces, etc.) and then configures the Databricks workspace, using the Databricks Terraform provider.

This answer says I cannot do AAD passthrough mounting with a Service Principal, which means I have to use OAuth2 authentication. For which, I need an Azure Key Vault backed secret scope in Databricks. The Terraform documentation says I can only do this with user-based authentication, not with my Service Principal.

So I thought maybe I could implement a hack: Create a Databricks PAT in Terraform (again, always as the Service Principal), then use the Terraform external resource to "shell out" to the Databricks CLI, authenticating with this PAT. I tried this manually and got this error:

{
  "error_code": "INVALID_PARAMETER_VALUE",
  "message": "Scope with Azure KeyVault must have userAADToken defined!"
}

This stands to reason, because the PAT is created for the Service Principal. However, as an alternative, this answer suggests using Azure AD token authentication, rather than the PAT. So down that rabbit hole, I go!

I can get the Azure AD token following Microsoft's documentation, then use that to authenticate for the Databricks CLI:

export ARM_TENANT_ID="..."
export ARM_CLIENT_ID="..."
export ARM_CLIENT_SECRET="..."

export DATABRICKS_AAD_TOKEN="$(curl -X POST \
                                    -H 'Content-Type: application/x-www-form-urlencoded' \
                                    -d "client_id=${ARM_CLIENT_ID}" \
                                    -d 'grant_type=client_credentials' \
                                    -d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default' \
                                    -d "client_secret=${ARM_CLIENT_SECRET}" \
                                    https://login.microsoftonline.com/${ARM_TENANT_ID}/oauth2/v2.0/token \
                             | jq -r .access_token)"

databricks configure --aad-token --host https://my-databricks-host.com

This authentication works: I can run various CLI commands (e.g., databricks tokens list) that return the expected result. However, now when I try to create the secret scope, it gives me a completely different error:

databricks secrets create-scope --scope "test" \
                                --scope-backend-type AZURE_KEYVAULT \
                                --resource-id "/subscriptions/my/key/vault/resource/id" \
                                --dns-name "https://my-vault-name.vault.azure.net/"

Error: Your authentication information may be incorrect. Please reconfigure with ``dbfs configure``

My first question would be: Is my hack even going to work? If it is, where am I going wrong with the AAD token authentication? If it isn't going to work, is my ultimate goal even possible, or would I have to run several Terraform deployments -- each with their own state -- in phases, under different AAD identities (Service Principal and regular user)?

CodePudding user response:

Yes, you can’t do that using AAD token issued for a service principal - it works only with AAD token of real user. It’s well known and well documented limitation of Azure, hopefully it will be fixed in future.

This is one of the major roadblocks on the way of implementing end-to-end automated provisioning of Azure Databricks workspaces

  • Related