I need some assistance to undertand the various forms of logging in to Databricks. I am using Terraform to provision Azure Databricks I would like to know the difference in the two codes below When i use option 1, i get the error as shown
Option 1:
required_providers {
azuread = "~> 1.0"
azurerm = "~> 2.0"
azuredevops = { source = "registry.terraform.io/microsoft/azuredevops", version = "~> 0.0" }
databricks = { source = "registry.terraform.io/databrickslabs/databricks", version = "~> 0.0" }
}
}
provider "random" {}
provider "azuread" {
tenant_id = var.project.arm.tenant.id
client_id = var.project.arm.client.id
client_secret = var.secret.arm.client.secret
}
provider "databricks" {
host = azurerm_databricks_workspace.db-workspace.workspace_url
azure_use_msi = true
}
resource "azurerm_databricks_workspace" "db-workspace" {
name = module.names-db-workspace.environment.databricks_workspace.name_unique
resource_group_name = module.resourcegroup.resource_group.name
location = module.resourcegroup.resource_group.location
sku = "premium"
public_network_access_enabled = true
custom_parameters {
no_public_ip = true
virtual_network_id = module.virtualnetwork["centralus"].virtual_network.self.id
public_subnet_name = module.virtualnetwork["centralus"].virtual_network.subnets["db-sub-1-public"].name
private_subnet_name = module.virtualnetwork["centralus"].virtual_network.subnets["db-sub-2-private"].name
public_subnet_network_security_group_association_id = module.virtualnetwork["centralus"].virtual_network.nsgs.associations.subnets["databricks-public-nsg-db-sub-1-public"].id
private_subnet_network_security_group_association_id = module.virtualnetwork["centralus"].virtual_network.nsgs.associations.subnets["databricks-private-nsg-db-sub-2-private"].id
}
tags = local.tags
}
Databricks Cluster Creation
resource "databricks_cluster" "dbcselfservice" {
cluster_name = format("adb-cluster-%s-%s", var.project.name, var.project.environment.name)
spark_version = var.spark_version
node_type_id = var.node_type_id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 7
}
azure_attributes {
availability = "SPOT_AZURE"
first_on_demand = 1
spot_bid_max_price = 100
}
depends_on = [
azurerm_databricks_workspace.db-workspace
]
}
Databricks Workspace RBAC Permission
resource "databricks_group" "db-group" {
display_name = format("adb-users-%s", var.project.name)
allow_cluster_create = true
allow_instance_pool_create = true
depends_on = [
resource.azurerm_databricks_workspace.db-workspace
]
}
resource "databricks_user" "dbuser" {
count = length(local.display_name)
display_name = local.display_name[count.index]
user_name = local.user_name[count.index]
workspace_access = true
depends_on = [
resource.azurerm_databricks_workspace.db-workspace
]
}
Adding Members to Databricks Admin Group
resource "databricks_group_member" "i-am-admin" {
for_each = toset(local.email_address)
group_id = data.databricks_group.admins.id
member_id = databricks_user.dbuser[index(local.email_address, each.key)].id
depends_on = [
resource.azurerm_databricks_workspace.db-workspace
]
}
data "databricks_group" "admins" {
display_name = "admins"
depends_on = [
# resource.databricks_cluster.dbcselfservice,
resource.azurerm_databricks_workspace.db-workspace
]
}
The error that i get while TF apply is below :
Error: User not authorized
with databricks_user.dbuser[1],
on resources.adb.tf line 80, in resource "databricks_user" "dbuser":
80: resource "databricks_user" "dbuser"{
Error: User not authorized
with databricks_user.dbuser[0],
on resources.adb.tf line 80, in resource "databricks_user" "dbuser":
80: resource "databricks_user" "dbuser"{
Error: cannot refresh AAD token: adal:Refresh request failed. Status Code = '500'. Response body: {"error":"server_error", "error_description":"Internal server error"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.core.windows.net/
with databricks_group.db-group,
on resources.adb.tf line 80, in resource "databricks_group" "db-group":
71: resource "databricks_group" "db-group"{
Is the error coming because of this block below ?
provider "databricks" {
host = azurerm_databricks_workspace.db-workspace.workspace_url
azure_use_msi = true
}
I just need to login automatically when i click on the URL from the portal. So what shall i use for that? And why do we need to provide two times databricks providers, once under required_providers and again in provider "databricks"? I have seen if i don't provide the second provider i get the error :
"authentication is not configured for provider"
CodePudding user response:
As mentioned in the comments , If you are using Azure CLI authentication i.e. az login
using your username and password , then you can use the below code :
terraform {
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.3.11"
}
}
}
provider "azurerm" {
features {}
}
provider "databricks" {
host = azurerm_databricks_workspace.example.workspace_url
}
resource "azurerm_databricks_workspace" "example" {
name = "DBW-ansuman"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
sku = "premium"
managed_resource_group_name = "ansuman-DBW-managed-without-lb"
public_network_access_enabled = true
custom_parameters {
no_public_ip = true
public_subnet_name = azurerm_subnet.public.name
private_subnet_name = azurerm_subnet.private.name
virtual_network_id = azurerm_virtual_network.example.id
public_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.public.id
private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
}
tags = {
Environment = "Production"
Pricing = "Standard"
}
}
data "databricks_node_type" "smallest" {
local_disk = true
depends_on = [
azurerm_databricks_workspace.example
]
}
data "databricks_spark_version" "latest_lts" {
long_term_support = true
depends_on = [
azurerm_databricks_workspace.example
]
}
resource "databricks_cluster" "dbcselfservice" {
cluster_name = "Shared Autoscaling"
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 7
}
azure_attributes {
availability = "SPOT_AZURE"
first_on_demand = 1
spot_bid_max_price = 100
}
depends_on = [
azurerm_databricks_workspace.example
]
}
resource "databricks_group" "db-group" {
display_name = "adb-users-admin"
allow_cluster_create = true
allow_instance_pool_create = true
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
resource "databricks_user" "dbuser" {
display_name = "Rahul Sharma"
user_name = "[email protected]"
workspace_access = true
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
resource "databricks_group_member" "i-am-admin" {
group_id = databricks_group.db-group.id
member_id = databricks_user.dbuser.id
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
Output:
If you are using Service Principal
as authentication , then you can use something like below:
terraform {
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.3.11"
}
}
}
provider "azurerm" {
subscription_id = "948d4068-xxxx-xxxx-xxxx-e00a844e059b"
tenant_id = "72f988bf-xxxx-xxxx-xxxx-2d7cd011db47"
client_id = "f6a2f33d-xxxx-xxxx-xxxx-d713a1bb37c0"
client_secret = "inl7Q~Gvdxxxx-xxxx-xxxxyaGPF3uSoL"
features {}
}
provider "databricks" {
host = azurerm_databricks_workspace.example.workspace_url
azure_client_id = "f6a2f33d-xxxx-xxxx-xxxx-d713a1bb37c0"
azure_client_secret = "inl7Q~xxxx-xxxx-xxxxg6ntiyaGPF3uSoL"
azure_tenant_id = "72f988bf-xxxx-xxxx-xxxx-2d7cd011db47"
}
resource "azurerm_databricks_workspace" "example" {
name = "DBW-ansuman"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
sku = "premium"
managed_resource_group_name = "ansuman-DBW-managed-without-lb"
public_network_access_enabled = true
custom_parameters {
no_public_ip = true
public_subnet_name = azurerm_subnet.public.name
private_subnet_name = azurerm_subnet.private.name
virtual_network_id = azurerm_virtual_network.example.id
public_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.public.id
private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
}
tags = {
Environment = "Production"
Pricing = "Standard"
}
}
data "databricks_node_type" "smallest" {
local_disk = true
depends_on = [
azurerm_databricks_workspace.example
]
}
data "databricks_spark_version" "latest_lts" {
long_term_support = true
depends_on = [
azurerm_databricks_workspace.example
]
}
resource "databricks_cluster" "dbcselfservice" {
cluster_name = "Shared Autoscaling"
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 7
}
azure_attributes {
availability = "SPOT_AZURE"
first_on_demand = 1
spot_bid_max_price = 100
}
depends_on = [
azurerm_databricks_workspace.example
]
}
resource "databricks_group" "db-group" {
display_name = "adb-users-admin"
allow_cluster_create = true
allow_instance_pool_create = true
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
resource "databricks_user" "dbuser" {
display_name = "Rahul Sharma"
user_name = "[email protected]"
workspace_access = true
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
resource "databricks_group_member" "i-am-admin" {
group_id = databricks_group.db-group.id
member_id = databricks_user.dbuser.id
depends_on = [
resource.azurerm_databricks_workspace.example
]
}
And why do we need to provide two times databricks providers, once under required_providers and again in provider "databricks"?
The required_providers
is used to download and initialize the required providers from the source i.e. Terraform Registry
. But the Provider Block
is used for further configuration of that downloaded provider like describing client_id, features block etc. which can be used for authentication or other configuration.
CodePudding user response:
The azure_use_msi
option is primarily intended for use from CI/CD pipelines that are executed on machines with managed identity assigned to them. All possible authentication options are described in the documenation, but simplest way is to use authentication via Azure CLI, so you just need to leave host
parameter in the provider block. If you don't have Azure CLI on that machine, you can use combination of host personal access token instead.
if you're running that code from the machine with assigned managed identity, then you need to make sure that this identity is either added into workspace, or it has Contributor access to it - see Azure Databricks documentation for more details.