I've been asked to migrate on-premises Python ETL scripts that live on a syslog box over to AWS. These scripts run as cron-jobs and output logs that a Splunk Forwarder parses and sends to our Splunk instance for indexing.
My initial idea was to deploy a Cloudwatch-triggered Lambda function that spins up an EC2 instance, runs the ETL scripts cloned to that instance (30 minutes), and then brings down the instance. Another idea was to containerize the scripts and run them as task definitions. They take approximately 30 minutes to run.
Any help moving forward would be nice; I would like to deploy this in IaaC, preferably in troposphere/boto3.
CodePudding user response:
Another idea was to containerize the scripts and run them as task definitions
This is probably the best approach. You can include the splunk universal forwarder container in your task definition (ensuring both containers are configured to mount the same storage where the logs are held) to get the logs into splunk. You can schedule task execution just like lambda functions or similar. Alternatively to the forwarder container, if you can configure the logs to output to stdout/stderr instead of log files, you can just setup your docker log driver to output directly to splunk.
Assuming you don't already have a cluster with capacity to run the task, you can use a capacity provider for the ASG attached to the ECS cluster to automatically provision instances into the cluster whenever the task needs to run (and scale down after the task completes).
Or use Fargate tasks with EFS storage and you don't have to worry about cluster provisioning at all.