Home > Enterprise >  Ruby: How to have a wildcard for a subdirectory in a path
Ruby: How to have a wildcard for a subdirectory in a path

Time:12-01

I am writing a helper in my Rails app and I need to provide a path with a wildcard for a subdirectory. The helper is pulling image files from an Amazon bucket - the files paths are

 images/patient_id/some_folder/image_files.dcm

Here is my helper

def get_files(image_folder)
   connection = Fog::Storage.new(
   provider: 'AWS',
   aws_access_key_id: AWS_ACCESS_KEY_ID,
   aws_secret_access_key: AWS_SECRET_ACCESS_KEY
   )
   connection.directories.get(AMAZON_BUCKET, prefix:"images/#{patient_number}/**").files.map do |file|
      file.key
  end
 end

I have tried many permuations. I'd be grateful for some help in expressing a wildcard at the "some_folder" level. Many thanks.

CodePudding user response:

While the S3 storage names may resemble a file path as you would see in your standard OS, in reality all the objects are stored at the same "level" in a flat construct (Bucket) and this "path" is actually just the unique identifier to retrieve a given object.

According to AWS What is Object Storage:

Comparing object storage and file storage

The primary differences between object and file storage are data structure and scalability. File storage is organized into hierarchy with directories and folders. File storage also follows strict file protocols, such as SMB, NFS, or Lustre. Object storage uses a flat structure with metadata and a unique identifier for each object that makes it easier to find among potentially billions of other objects.

Amazon S3 Features:

Amazon S3’s flat, non-hierarchical structure and various management features are helping customers of all sizes and industries organize their data in ways that are valuable to their businesses and teams. All objects are stored in S3 buckets and can be organized with shared names called prefixes...

Understanding this you can see that there is no need for "glob" style traversal because there is nothing to actually traverse. All you need to do is provide the desired prefix for the highest "Level" you wish to retrieve and the service will return each item with that prefix.

To apply this in your case you simply want everything "under" "images/#{patient_number}/" so that is your prefix.

Example:

connection
  .directories
  .get(AMAZON_BUCKET, prefix:"images/#{patient_number}/")
  .files.map(&:key)
  • Related