Home > database >  What are SageMaker pipelines actually?
What are SageMaker pipelines actually?

Time:12-02

Sagemaker pipelines are rather unclear to me, I'm not experienced in the field of ML but I'm working on figuring out the pipeline definitions.

I have a few questions:

  • Is sagemaker pipelines a stand-alone service/feature? Because I don't see any option to create them through the console, though I do see CloudFormation and CDK resources.

  • Is a sagemaker pipeline essentially codepipeline? How do these integrate, how do these differ?

  • There's also a Python SDK, how does this differ from the CDK and CloudFormation?

I can't seem to find any examples besides the Python SDK usage, how come?

The docs and workshops seem only to properly describe the Python SDK usage,it would be really helpful if someone could clear this up for me!

CodePudding user response:

SageMaker has two things called Pipelines: Model Building Pipelines and Serial Inference Pipelines. I believe you're referring to the former

A model building pipeline is defined in JSON, and is hosted/run in some sort of serverless fashion by SageMaker

Is sagemaker pipelines a stand-alone service/feature? Because I don't see any option to create them through the console, though I do see CloudFormation and CDK resources.

You can create/modify them using the API, which can also be called via the CLI, Python SDK, or CloudFormation. These all use the AWS API under the hood

You can start/stop/view them in SageMaker Studio:

Left-side Navigation bar > SageMaker resources > Drop-down menu > Pipelines

Is a sagemaker pipeline essentially codepipeline? How do these integrate, how do these differ?

Unlikely. CodePipeline is more for building and deploying code, not specific to SageMaker. There is no direct integration as far as I can tell

There's also a Python SDK, how does this differ from the CDK and CloudFormation?

The Python SDK is a stand-alone library to interact with SageMaker in a developer-friendly fashion. I'd say it is a bit more dynamic than CloudFormation. Let's you build pipelines using code. Whereas CloudFormation takes a static JSON string

You can also define model building workflows using Step Function State Machines, using the Data Science SDK

  • Related