Home > Blockchain >  Extract model saved in S3 bucket as tar.gz format to sagemaker notebook instance
Extract model saved in S3 bucket as tar.gz format to sagemaker notebook instance

Time:07-29

I have a tar.gz file inside a S3 bucket, this is a file containing 6 different 'pickled' model zipped together. This was created after training a model with SageMaker docker container in single run.

In order to make an inference, I would like untar these models into separate models to run model.predict() on the test data.

My S3 bucket structure: 's3:///output/train_best_params/model.tar.gz'

How can I download these into SageMaker notebook instance and extract the 6 different models from it: as model1, model2, ....

If I simply use sagemaker.model.Model() method, I couldn't make any inference, because this model object will have multiple models inside it.

Thank you

CodePudding user response:

Ideally you would have to write some script to download the tar.gz file from the training job output and individually seperate the models into different tar.gz files ( you will have 6 in this case) along with corresponding inference.py files. Then you can deploy them separately using batch transform - https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_batch_transform/pytorch_mnist_batch_transform/pytorch-mnist-batch-transform.ipynb

CodePudding user response:

  1. If your model is predicting the same thing you might want to recheck the training script whether it is saving multiple times or it is just a checkpoints. I suggest you pick the best one from validation or remove duplications.

  2. If your 6 models are predicting different features, and you want to run it all at the same time, I suggest you download the file, extract it, compress it separately and upload it back separately.

If creating the models is more like a one-time thing, you don't need to create a script and proceed with creating the model in sagemaker. Here's an example of how I did it.

import boto3
sm_client = boto3.client(service_name='sagemaker')
container = {
    'Image': 'my_account_id.dkr.ecr.my_region.amazonaws.com/my_model_repo_name:latest',
    'ModelDataUrl': 'path/to/your/model.tar,gz',
    'Mode': 'SingleModel'
}
create_model_response = sm_client.create_model(
    ModelName ='my_model',
    ExecutionRoleArn = 'my_role',
    Containers = [container],
    Tags=[{'my_tag':'somethingcool'}])

print("Model Arn: "   create_model_response['ModelArn'])
  • Related