Home > Software engineering >  Creating a directory inside another directory in blob storage
Creating a directory inside another directory in blob storage

Time:12-23

I wanted to create a subdirectory in a directory in one of my Azure blob storage containers. I know that it is not possible doing it via UI. So I created a databricks notebook and executed the following command:

dbutils.fs.mkdir("mnt/<containername>/directory/subdirectory).

The command is executing I mean it is not throwing any error and it is creating up to the directory level. But when it comes to the subdirectory the code is not creating one. All the mount points are correct.

Our team used the same (back in 2021 I guess) code to create a subdirectory, then it worked now it is not. Can someone help me with this?

Thank you.

CodePudding user response:

You are probably mixing vanilla Blob Storage with enter image description here

Second, there is storage that is automatically attached to the cluster which I consider local. However, nothing is really local in the cloud!

%fs ls /

If you execute the above command, you will see the file system that part of the data pane. If you never seen it before, here is the diagram of both the control and data planes. What is missing from this diagram is that parts of DBFS can be mount points to remote storage. Also, if you use URLs, then you talk directly to the remote storage service.

enter image description here

Azure Blob Storage is the foundation for both BLOBs and Hierarchical containers. It is all the same foundation now a-days but I suggest you using ADLS Gen 2.0 for both the RBAC and ACL security layers.

Please see the article below for details. Some people mount the storage when the cluster comes up. This can be done via additional cluster configuration commands. Others like passing the credentials via spark session variables with a notebook. I find this technique fragile. If you do not have access to the program that sets the spark configuration, you do not have access to the remote storage.

https://docs.databricks.com/external-data/azure-storage.html

My guess from your statement above is that the storage was mounted but no longer exists. How did I determine this? It is tradition to mount storage under /mnt.

If you are using blob storage, you are probably using a Shared Access Signature. Those are time based and expire. More details need to be supplied to narrow down the issue.

If you are using ADLS, you are probably using a service principle to mount storage. This means the principle needs both Storage Blob Contributor rights at the RBAC layer and "rwx" at the ACL layer. Use Azure Storage explorer to assign the rights.

https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model

If you are curious, use the following command to see existing mount points.

display(dbutils.fs.mounts())

In summary, there are many ways to play with storage. More details are needed to narrow down you exact problem.

CodePudding user response:

In Azure Blob storage, you can create a "virtual directory" by creating a blob with a name that includes a forward slash (/) character. For example, if you have a container named "my-container", you can create a virtual directory named "dir1/dir2" by creating a blob with the name "dir1/dir2/".

To create a virtual directory using the Azure portal, follow these steps:

Navigate to the Azure portal and sign in.

In the left-hand menu, click "Storage accounts" and then select the storage account that contains the container where you want to create the virtual directory.

In the storage account overview page, click the "Containers" link.

Click the name of the container where you want to create the virtual directory.

Click the "Upload" button at the top of the container page.

In the "Upload blob" dialog box, select the "Folder" option.

Enter the name of the virtual directory you want to create in the "Folder name" field. For example, to create a virtual directory named "dir1/dir2", enter "dir1/dir2" in the field.

Click the "Select" button to create the virtual directory.

To create a virtual directory using the Azure Storage SDK for your programming language, you can use the create_blob_from_text method and pass in the name of the virtual directory as the blob name. For example, in Python, you could use the following code:

from azure.storage.blob import BlobClient

blob_client = BlobClient(account_url="<your-account-url>", container_name="my-container", blob_name="dir1/dir2/")
blob_client.create_blob_from_text("")

This will create a virtual directory named "dir1/dir2" in the container "my-container".

Note that virtual directories in Azure Blob storage are not actual directories, but rather a way to create a hierarchical structure for your blobs. You cannot create actual subdirectories inside a virtual directory, but you can create blobs with names that include the virtual directory as a prefix. For example, you could create a blob with the name "dir1/dir2/my-blob.txt". This would be equivalent to creating a file named "my-blob.txt" inside a directory named "dir2" that is located inside a directory named "dir1".

  • Related