AWS Sagemaker integration with mongodb and lambda-CodePudding

I'm looking for some advice from anyone who's tried aws Sagemaker. I'm very new to this and would appreciate anyone kind enough to help me out.

I have created a basic time series project in a Sagemaker notebook. It trains the model on CSV file data and tests it, with good results.

The data I am using is based on store profits. I am predicting the profit each week.

However, my question is, how can I pass new store sales data into this model each week (only one day a week), and retrain it with the new weeks data (so it can notice any new patterns), then for it to predict the next week profit for each store?

All my store data is synced into mongodb, so I'm presuming I would need a lambda function to get this data and pass it over to the Sagemaker model.

Is it worth retraining the model every week? As I have years worth of store data? Or should I just pass over the old data with the new data added in for it to predict? How do I pass over this data? In a lambda function with a cloud event to make it run automatically every week?

Can I write the predictions back into mongodb in a new table, or are they saved somewhere else first and this would have to be another lambda function?

I have looked at so many tutorials, but none of them seem to explain how I can connect everything up and have the model make predictions automatically and then save them in a dB.

Many thanks in advance to anyone who can explain this to me! Sorry for such a long question!

CodePudding user response：

You will need to use Lambda/Step function to fetch the data from MongoDB data source as you rightly pointed out to send it to the Sagemaker model. Additionally, retraining of the model depends on various factors you might be looking at, such as Accuracy, F1 score and many more. If you want to monitor the model in SageMaker for quality you can leverage SageMaker Model Monitor and trigger the model re-training if there is a deviation in the expected results. You can use SageMaker Pipelines to orchestrate these pieces together.

SageMaker Pipelines is integrated with Model Monitor Reference

CodePudding user response：

Recently I have completed similar use case and here's my answers -

Q1 : Is there need of retraining every week?

Ans : Yes, you need to do continuous training and continuous forecasting steps (tie using sagemaker pipeline) in prod to make it work perfectly automated for stable MAE, MAPE etc.

Q2 : How can I pass new data and forecast for next week? How to get input data from mongodb?

Ans : You could use Lambda, or Glue job (designed for ETL so better) to drop in S3 bucket. This will could become input raw data bucket for sagemaker pipeline.

Q3 : Can I write the predictions back into mongodb in a new table, or are they saved somewhere else first and this would have to be another lambda function?

Ans : Yes you can, both ways.

I would suggest to start small i.e. First drop a csv file to s3 location in say YYMMDD folder. Use this as input and develop completely in one notebook (continuous train, continuous forecast).

Later, learn about pipelines - how to write different steps, pass objects between steps etc and go modify your code to fit in pipeline.

Create a sagemaker pipeline with steps : (Refer links below )

Preprocess ( any transformations, cleansing )
Training ( use prebuilt image or need to build one, depends)
Forecast ( either do batch transform or deploy to endpoint and later delete )
Post processing ( if required )

Take the output the from sagemaker pipelines to mongodb. Sagemaker Pipelines help automate scheduled execution using AWS Event Bridge

Some references :

Example Pipelines Look here in to know about pipelines

Example1 DeepAR

Example2 DeepAR