Home > Software engineering >  AWS SageMaker Notebook's Default S3 Bucket - Cant Access Uploaded Files within Notebook
AWS SageMaker Notebook's Default S3 Bucket - Cant Access Uploaded Files within Notebook

Time:02-06

In SageMaker Studio, I created directories and uploaded files to my SageMaker's default S3 bucket using the GUI, and was exploring how to work with those uploaded files using a SageMaker Studio Notebook.

Within the SageMaker Studio Notebook, I ran

sess = sagemaker.Session()
bucket = sess.default_bucket() #sagemaker-abcdef
prefix = "folderJustBelowRoot"

conn = boto3.client('s3')
conn.list_objects(Bucket=bucket, Prefix=prefix) 
# this returns a response dictionary with the corresponding metadata, which includes 'HTTPStatusCode': 200, 'server': 'AmazonS3' => which means the request-response was successful

What I dont understand is why the 'Contents' key and its value are missing from the 'conn.list_objects' dictionary response?

And when I go to 'my SageMaker's default bucket' in the S3 console, I am wondering why my uploaded files are not appearing.

===============================================================

I was expecting

  • the response from conn.list_objects(Bucket=bucket, Prefix=prefix) to contain the 'Contents' key (within my SageMaker Studio Notebook)

  • the S3 console to show the files I uploaded to 'my SageMaker's default bucket'

CodePudding user response:

Question 2: And when I go to 'my SageMaker's default bucket' in the S3 console, I am wondering why my uploaded files are not appearing.

It seems that when you upload files from your local desktop/laptop onto AWS SageMaker Studio using the GUI, your files are in the the Elastic Block Storage/EBS of your SageMaker Studio instance.

To access the following items within your SageMaker Studio instance:

  • Folder Path - "subFolderLayer1/subFolderLayer2/subFolderLayer3" => to access 'subFolderLayer3'
  • File Path - "subFolderLayer1/subFolderLayer2/subFolderLayer3/fileName.extension" => to access 'fileName.extension' within your subFolderLayers

=========

To access the files on the default S3 storage bucket for your AWS SageMaker instance, first identify it by

sess = sagemaker.Session()
bucket = sess.default_bucket() #sagemaker-abcdef

Then go to the bucket and upload your files and folders. When you have done that, move to the response for question 1.

=================================================================

Question 1: What I dont understand is why the 'Contents' key and its value are missing from the 'conn.list_objects' dictionary response?

prefix = "folderJustBelowYourBucket"
conn = boto3.client('s3')
conn.list_objects(Bucket=bucket, Prefix=prefix) 

The 'conn.list_objects' dictionary response now contains a 'Contents' key, containing a list of metadata as its values - 1 metadata dictionary for each file/folder within that 'prefix'/'folderJustBelowYourBucket'.

CodePudding user response:

You can upload and download files from Amazon SageMaker to Amazon S3 using SageMaker Python SDK. SageMaker S3 utilities provides S3Uploader and S3Downloader classes to easily work with S3 from within SageMaker studio notebooks.

A comment about the 'file system' in your question 2, the files are stored onto SageMaker Studio user profile Amazon Elastic File System (Amazon EFS) volume, and not EBS(SageMaker classic notebooks uses EBS volumes). Refer this blog for more detailed overview of SageMaker architecture

  • Related