Home > database >  copy csv files from azure blob storage A to azure blob storage B
copy csv files from azure blob storage A to azure blob storage B

Time:12-27

I want to copy files from A blob storage to blob B, with a condition to copy/write only new files to storage B. I'm using BlockBlobService package to list all files one blob A, but does this package also have a function to copy/write a new file to another blob storage?? I'm writing this in Python btw.. Please help me out :(...I'm a bit helpless now I tried to use this package DataLakeServiceClient to write a file to azure blob storage B. But this packaged DataLakeServiceClient is not compatible with BlockBlobService. So I do know what to do.

If you have tried another method to do the same thing I want to do, please share with me your wisdom and knowledge.

CodePudding user response:

I would say, try azcopy tool. It support copying data between storage accounts. For example:

azcopy copy 'https://sourceacc.blob.core.windows.net/container/dir' 'https://destacc.blob.core.windows.net/container' --recursive

Then, use it with the --overwrite flag with value false or ifSourceNewer to specify the behavior for existing blobs at the destination:

--overwrite (string) Overwrite the conflicting files and blobs at the destination if this flag is set to true. (default 'true') Possible values include 'true', 'false', 'prompt', and 'ifSourceNewer'. For destinations that support folders, conflicting folder-level properties will be overwritten this flag is 'true' or if a positive response is provided to the prompt. (default "true")

See this doc for how to get started.

CodePudding user response:

After reproducing from my end, I could able to achieve this using get_blob_to_path and create_blob_from_path of BlockBlobService. Below is the complete code that worked for me.

from azure.storage.blob import BlockBlobService
import os 

SOURCE_ACCOUNT_NAME = "<source_Account_Name>"
SOURCE_CONTAINER_NAME = "<source-container>"
SOURCE_SAS_TOKEN='<Source_Storage_Account_SAS_Token>'

DESTINATION_ACCOUNT_NAME = "<destination_Account_Name>"
DESTINATION_CONTAINER_NAME = "<destination-container>"
DESTINATION_SAS_TOKEN='<Destination_Storage_Account_SAS_Token>'

source_blob_service = BlockBlobService(account_name=SOURCE_ACCOUNT_NAME,account_key=None,sas_token=SOURCE_SAS_TOKEN)
destination_blob_service = BlockBlobService(account_name=DESTINATION_ACCOUNT_NAME,account_key=None,sas_token=DESTINATION_SAS_TOKEN)

generator = source_blob_service.list_blobs(SOURCE_CONTAINER_NAME)
for blob in generator:
    blobname=blob.name
    source_blob_service.get_blob_to_path(SOURCE_CONTAINER_NAME,blobname,blobname,'wb')
    destination_blob_service.create_blob_from_path(DESTINATION_CONTAINER_NAME,blobname,blobname)
    os.remove(blobname)

RESULTS:

enter image description here

enter image description here

  • Related