Home > Software engineering >  Find Blobs in Azure Storage Container in multi level folder structure using C#
Find Blobs in Azure Storage Container in multi level folder structure using C#

Time:12-18

I am having folder structure like

(Container)->(1)School->(2)Staffs
                            ->(2.a)OfficeStaffs
                                     -> (2.a.i)Admin ->(Blobs)
                                     -> (2.a.ii)Clerk->(Blobs)
                    
                            ->(2.b)Teachers
                                     ->(2.b.i)SeniorStudents ->(Year)->AttendanceReport.xlx
                                                                     ->ExamReport.xlx   
                                     ->(2.b.ii)JuniorStudents ->(Year)->AttendanceReport.xlx 
                                                                     ->ExamReport.xlx

School is my parent folder and all other folders are sub folders. Now I need to find blobs using folder name. The folder name may persist in middle. For example user have the search options in the UI by Staff type or Teachers or Students Type or By Year. There is no mandatory options to search blobs by folder level one by one. If the User selects Teachers , need to display all teachers and Students folder with respective blobs. If the user selects year , we need to get all the blobs belongs to the particular year folder. In this case, we will receive 'Year' value from user. We will not be knowing its parent folders. Based on the year only we need to retrieve. If User selects OfficeStaffs and Teachers, we need to retrieve all the subfolders and blobs from both the folders.

I tried with Blob Prefix to get middle folder but no luck. It is always expecting the Initial folder path and with next folders in order basis. Could not able to get the middle folder.

BlobContainerClient client = new BlobContainerClient(connectionString, containerName);
        List<FileData> files = new List<FileData>();
        
        await foreach (BlobItem file in client.GetBlobsAsync(prefix: "SeniorStudents"))
        {

            files.Add(new FileData
            {
                FileName = file.Name                  
            }   
        }       

This is not getting the blobs under SeniorStudents folder. It is returning empty. Please help me on this. Thanks.

CodePudding user response:

Fetching the folders from container

string connectionString = "Connection String";
            List<string> dir = new List<string>();
            BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
            string containerName = "containerName";
            BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);
            var blobitem = containerClient.GetBlobs(prefix: "test");
            List<FileData> files = new List<FileData>();

            foreach (var file in blobitem)
            {
                Console.WriteLine(file.Name);
                string[] sub_names = file.Name.Split('/');
                Console.WriteLine(sub_names.Length);

                files.Add(new FileData
                {
                    FileName = file.Name
                });

                if (sub_names.Length > 1 && !dir.Contains(sub_names[sub_names.Length - 1]))
                {
                    dir.Add(sub_names[sub_names.Length - 1]);
                }
            }
            foreach (var item in dir)
            {
                Console.WriteLine(item);
            }
             
             

Fetching the files and folder structure using code

enter image description here

Sample Output using the C# code

enter image description here

Blob storage in explorer

enter image description here

In Azure Portal

enter image description here

CodePudding user response:

I am having folder structure like

No, you don't. Unless hierarchical namespace is enabled all the (sub)folders you see are virtual. Those folders are defined by the name of the blob. Each / will be seen as a virtual folder in the storage explorer.

In your case you have multiple blobs in a container:

  1. School/Staffs/OfficeStaffs/Admin/Blob1.ext
  2. School/Staffs/OfficeStaffs/Clerk/Blob2.ext
  3. School/Staffs/Teachers/SeniorStudents/2022/AttendanceReport.xlx
  4. School/Staffs/Teachers/SeniorStudents/2022/ExamReports.xlx
  5. School/Staffs/Teachers/JuniorStudents/2022/AttendanceReport.xlx
  6. School/Staffs/Teachers/JuniorStudents/2022/ExamReports.xlx

As you can see it is a flat list. When you try to find blobs based on a prefix you need to remember it is like the equivalent of matching a string using the C# string.StartsWith method.

So with prefix School/Staffs/OfficeStaffs/Admin/ you will find blob 1, a prefix School/Staffs/Teachers will give you blobs 3 to 6. The prefix Staffs does not list any blobs as there are no blobs that have the text staffs at the start of their name.

In your case, that means that you will have to get all blobs, split their names using for example string.Split(). For example, the code below finds all blobs that are somehow in a folder named SeniorStudents, no matter at what level that virtual folder is present:

    class FileData
    {
        public string FileName { get;set;}
        public IEnumerable<string> Folders => FileName.Split('/').SkipLast(1);
    }

...

    await foreach (BlobItem file in client.GetBlobsAsync())
    {
        files.Add(new FileData
        {
            FileName = file.Name
        });
    }
    
    var targetFiles = files.Where(f => f.Folders.Contains(("SeniorStudents")));

In the above example, if you want all 2022 files for all teachers you can do:

var targetFiles = files.Where(f => f.Folders.Contains("Teachers") && f.Folders.Contains("2022"));

Alternative

If you have lots of blobs the above method will force you to perform an inefficient query to get maybe 5 results out of 2000 blobs because you need to get all the blobs before you can determine whether they match the criteria.

As an alternative you might want to add tags to your blobs, each tag representing a folder or category. It is then easy to find all blobs having a specific tag. Beware the limits, there may be up to 10 tags defined on a given blob, see the docs.

  • Related