Home > Enterprise >  Azure Batch NodePreparationError trying to fetch Docker image from Azure Container Registry
Azure Batch NodePreparationError trying to fetch Docker image from Azure Container Registry

Time:09-30

I'm trying to run an Azure Batch task on an Ubuntu VM with an image pulled from a private Azure Container Registry. The nodes in the pool fail on creation with the following error, whether I pre-fetch or not:

Code: NodePreparationError

Message:
An error occurred during node preparation

Values:
Error - Hit unexpected error installing containers
Message - 400, message='Bad Request', url=URL('http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/&mi_res_id=/subscriptions/7bd2fd6e-1cb6-4db2-82fe-67c7ea3024cd/resourceGroups/SANDBOX/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my_uami')

Baseline: I have an Azure Subscription with a Resource Group. In the Resource Group is

  • a Container Registry,
  • a Batch Account, and
  • a User Assigned Managed Identity.

The UAMI is assigned in the Identity blade of both the Container Registry and the Batch Account. It has been assigned the AcrPull role by an admin for my subscription.

I can pull the image to my local machine, so I know it exists. I have tried running a simple task on a pre-fetched python3.7-slim image from Docker Hub and succeeded, so the problem is somewhere between Batch and ACR.

Here is a minimal sample demonstrating the problem:

from azure.batch import BatchServiceClient
from azure.batch.batch_auth import SharedKeyCredentials
from azure.batch.models import (
  ComputeNodeIdentityReference,
  ContainerConfiguration,
  ContainerRegistry,
  ImageReference,
  JobAddParameter,
  PoolAddParameter,
  PoolInformation,
  VirtualMachineConfiguration,
)

if __name__ == '__main__':
  batch_service_client = BatchServiceClient(
    SharedKeyCredentials('batchtest2021', 'GZTn…………………………………pGJ gNE…………………………dvw=='),
    batch_url='https://batchtest2021.westeurope.batch.azure.com/',
  )
  pool_id = 'my_test_pool'
  new_pool = PoolAddParameter(
    id=pool_id,
    virtual_machine_configuration=VirtualMachineConfiguration(
      container_configuration=ContainerConfiguration(
        container_image_names=[
          'myprivateacr.azurecr.io/mydockerimage:latest',
        ],
        container_registries=[
          ContainerRegistry(
            registry_server='myprivateacr.azurecr.io',
            identity_reference=ComputeNodeIdentityReference(
              resource_id=f'/subscriptions/7bd2fd6e-1cb6-4db2-82fe-67c7ea3024cd/resourceGroups/SANDBOX/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my_uami'
            ),
          ),
        ],
      ),
      image_reference=ImageReference(
        publisher='microsoft-azure-batch',
        offer='ubuntu-server-container',
        sku='20-04-lts',
        version='latest',
      ),
      node_agent_sku_id='batch.node.ubuntu 20.04',
    ),
    vm_size='STANDARD_A2M_V2',
    target_dedicated_nodes=2,
  )
  batch_service_client.pool.add(new_pool)

  job = JobAddParameter(id='sample_job_id', pool_info=PoolInformation(pool_id=pool_id))
  batch_service_client.job.add(job)

The code is based on the Batch Python Quickstart samples and the Batch documentation.

I have tried various steps in the Troubleshoot registry login guide without effect. I have no problems signing in to the ACR through Azure Shell, but that's with my regular user, not the UAMI of course.

GUIDs have been changed to protect the innocent.

Halp?

CodePudding user response:

When using managed identity for pools you will have to add the identity to the pool itself, setting an identity on the account allows the batch service itself to use the identity but not the VMs within your pool. Please note that you do not actually need to set the identity on the account in your use-case (Azure Container Registry), just the pool.

Please see the docs on assign identities to pools here:

https://docs.microsoft.com/azure/batch/managed-identity-pools

  • Related