Is there a faster way to update sagemaker endpoint when running in script mode using inference.py-CodePudding

I am iterating on a model inference deployment using script mode in Sagemaker (currently running in local mode) and update my inference script often. Every time I update the inference.py entry point script, I need to recreate the model instance again like this:

model_instance = PyTorchModel(

    model_data=model_tar_path,
    role=role,
    source_dir="code",
    entry_point="inference.py",
    framework_version="1.8",
    py_version="py3"
)

and then call

predictor = model_instance.deploy(
    initial_instance_count=1,
    endpoint_name='some_name',
    instance_type=instance_type,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer())

over and over every time I change something. This take super long each time because it basically restarts a new docker container (remember running locally) and waits to install all dependencies before I can start to do something with it. And if at all, there is any error, I need to do the whole thing all over again.

I'd like to explore any possible way I can utilize the update_endpoint functionality that allows me to essentially redeploy the endpoint within the same container, without having to recreate a new container every time and then waiting for all dependency installations etc.

CodePudding user response：

SageMaker local mode is designed to imitate the hosted environment. As such, every time you deploy/update a new container is run.

For faster development, I usually bake all the packages I can into the container which stops the need for installation on every deploy.

I.e. You can extend the SageMaker PyTorch container and bake your packages into it instead of using a requirements.txt. You can then push the image to ECR and specify it in the PyTorchModel

https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

In your PyTorchModel:

model_instance = PyTorchModel(
    image_uri = <YourImageECRURI>,
    model_data=model_tar_path,
    role=role,
    source_dir="code",
    entry_point="inference.py",
    framework_version="1.8",
    py_version="py3"
)