I want to list all the blobs in a container and then ultimately store each blobs contents (each blob stores a csv file) into a data frame, it appears that the blob service client is the easiest way to list all the blobs, and this is what I have:
#!/usr/bin/env python3
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from pathlib import Path
from io import StringIO
import pandas as pd
def main():
connect_str = os.environ['AZURE_CONNECT_STR']
container = os.environ['CONTAINER']
print(connect_str "\n")
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" blob.name)
if __name__ == "__main__":
main()
However, in the last version of blob storage client there appears to be no method which allows me to get the actual contents of the blob, what code should I be using ? there are other clients in the Python SDK for Azure, but it getting a full list of the blobs in a container using these seems cumbersome.
CodePudding user response:
What you would need to do is create an instance of BlobClient
using the container_client
and the blob's name. You can then call download_blob
method to download the blob.
Something like:
for blob in blob_list:
print("\t" blob.name)
blob_client = container_client.get_blob_client(blob.name)
blob_client.download(...)