Home > Software engineering >  Iterate the files in an ADLS2 Azure Datalake Directory given a SAS url
Iterate the files in an ADLS2 Azure Datalake Directory given a SAS url

Time:12-22

I'd like to download the files from a ADLS2 Storage blob directory - I have only a SAS url to the said directory, and I would like to recursively download all the files in that directory.

It is very clear how to do this given the storage credentials, and there are many examples that show how to do it - but I couldn't find any which uses a SAS url.

Any clues or documentation links would be much appreciated!

CodePudding user response:

I have reproduced in my environment, and I got expected results as below and I have taken code from @ROGER ZANDER's Blog:

function DownloadBlob {
    param (
        [Parameter(Mandatory)]
        [string]$URL,
        [string]$Path = (Get-Location)
    )
    
    $uri = $URL.split('?')[0]
    $sas = $URL.split('?')[1]
    $newurl = $uri   "?restype=container&comp=list&"   $sas 
    $body = Invoke-RestMethod -uri $newurl 
    $xml = [xml]$body.Substring($body.IndexOf('<')) 
    $files = $xml.ChildNodes.Blobs.Blob.Name
    $files | ForEach-Object { $_; New-Item (Join-Path $Path (Split-Path $_)) -ItemType Directory -ea SilentlyContinue | Out-Null
        (New-Object System.Net.WebClient).DownloadFile($uri   "/"   $_   "?"   $sas, (Join-Path $Path $_))
     }
}

Then call DownloadBlob Function and Give SAS URL.

Output:

enter image description here

In Local Machine Downloaded File:

enter image description here

CodePudding user response:

Use : https://learn.microsoft.com/en-us/dotnet/api/azure.storage.files.datalake.datalakefileclient?view=azure-dotnet

I don't know if it exists an method for downloading a directory from blob storage. But you can create a download folder and download all the files in the directory by a loop. It's a few steps:

Create a service client using "Datalakeserviceclient" to get access to datalake using SAS Use: DataLakeFileClient(Uri, AzureSasCredential) to create client .

Then to access the container use: DataLakeFileSystemClient

fileSystem = CreateFileSystem(client, _containerName)

use DataLakeDirectoryClient directoryClient = fileSystem.GetDirectoryClient(directoryName); to get the directory

To loop through the items in the directory use the loop below:

foreach (PathItem pathItem in directoryClient.GetPaths())
        {
            int pos = pathItem.Name.LastIndexOf("/")   1;
            DataLakeFileClient fileClient = directoryClient.GetFileClient(pathItem.Name.Substring(pos, pathItem.Name.Length - pos));

            await fileClient.ReadToAsync(downloadpath   @"\"   pathItem.Name);

        }
  • Related