Shared input in Sagemaker inference pipeline models-CodePudding

I'm deploying a SageMaker inference pipeline composed of two PyTorch models (model_1 and model_2), and I am wondering if it's possible to pass the same input to both the models composing the pipeline.

What I have in mind would work more or less as follows

Invoke the endpoint sending a binary encoded payload (namely payload_ser), for example:

client.invoke_endpoint(EndpointName=ENDPOINT,
                       ContentType='application/x-npy',
                       Body=payload_ser)

The first model parses the payload with inut_fn function, runs the predictor on it, and returns the output of the predictor. As a simplified example:

def input_fn(request_body, request_content_type):
    if request_content_type == "application/x-npy":
        input = some_function_to_parse_input(request_body)
    return input

def predict_fn(input_object, predictor):
    outputs = predictor(input_object)
    return outputs

def output_fn(predictions, response_content_type):
    return json.dumps(predictions)

The second model gets as payload both the original payload (payload_ser) and the output of the previous model (predictions). Possibly, the input_fn function would be used to parse the output of model_1 (as in the "standard case"), but I'd need some way to also make the original payload available to model_2. In this way, model_2 will use both the original payload and the output of model_1 to make the final prediction and return it to whoever invoked the endpoint.

Any idea if this is achievable?

CodePudding user response：

Sounds like you need an inference DAG. Amazon SageMaker Inference pipelines currently supports only a chain of handlers, where the output of handler N is the input for handler N 1.

You could change model1's predict_fn() to return both (input_object, outputs), and output_fn(). output_fn() will receive these two objects as the predictions, and will handle serializing both as json. model2's input_fn() will need to know how to parse this pair input.

Consider implementing this as a generic pipeline handling mechanism that adds the input to the model's output. This way you could reuse it for all models and pipelines.

You could allow the model to be deployed as a standalone model, and as a part of a pipeline, and apply the relevant input/output handling behavior that will be triggered by the presence of an environment variable (Environment dict), which you can specify when creating the inference pipelines model.