Home > Software design >  How to download new uploaded files from s3 to ec2 everytime
How to download new uploaded files from s3 to ec2 everytime

Time:12-02

I have an s3 bucket which will receive new files throughout the day. I want to download these to my ec2 instance everytime a new file is uploaded to the bucket.

I have read that its possible using sqs or sns or lambda. Which is the easiest of them all? I need the file to be downloaded as early as possible once it is uploaded into the bucket.

EDIT

I basically will be getting png images in the bucket every few seconds or minutes. Everytime a new image is uploaded, I want to download that on the instance which is already running. I will do some AI processing. As the images will keeep coming into the bucket, I want to constantly keep downloading it in the ec2 and process it as soon as possible. This is my code in the Lambda function so far.

import boto3
import json


def lambda_handler(event, context):
    """Read file from s3 on trigger."""
    #print(event)
    s3 = boto3.client("s3")
    client = boto3.client("ec2")
    ssm = boto3.client("ssm")
    instanceid = "******"

    if event:
        file_obj = event["Records"][0]
        #print(file_obj)
        bucketname = str(file_obj["s3"]["bucket"]["name"])
        print(bucketname)
        filename = str(file_obj["s3"]["object"]["key"])
        print(filename)
        
        response = ssm.send_command(
            InstanceIds=[instanceid],
            DocumentName="AWS-RunShellScript",
            Parameters={
                "commands": [f"aws s3 cp {filename} ."]
            },  # replace command_to_be_executed with command
        )
        
  
        # fetching command id for the output
        command_id = response["Command"]["CommandId"]

        time.sleep(3)

        # fetching command output
        output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
        print(output)
    return

However I am getting the following error

Test Event Name
test

Response
{
  "errorMessage": "2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds"
}

Function Logs
START RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Version: $LATEST
END RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936
REPORT RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936  Duration: 3003.58 ms    Billed Duration: 3000 ms    Memory Size: 128 MB Max Memory Used: 87 MB  Init Duration: 314.81 ms    
2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds

Request ID
88dbe51b-53d6-4c06-8c16-207698b3a936

When I remove all the lines related to ssm, it works fine. Is there any permission issue or is there any problem with the code?

EDIT2

My code is working but I dont see any output or change in my ec2 instance. I should be seeing an empty text file in the home directory but I dont see anything Code

import boto3
import json
import time

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    """Read file from s3 on trigger."""
    #print(event)
    s3 = boto3.client("s3")
    client = boto3.client("ec2")
    ssm = boto3.client("ssm")
    instanceid = "******"
    print("HI")
    if event:
        file_obj = event["Records"][0]
        #print(file_obj)
        bucketname = str(file_obj["s3"]["bucket"]["name"])
        print(bucketname)
        filename = str(file_obj["s3"]["object"]["key"])
        print(filename)
        print("sending")
        try:
            response = ssm.send_command(
                InstanceIds=[instanceid],
                DocumentName="AWS-RunShellScript",
                Parameters={
                    "commands": ["touch hi.txt"]
                },  # replace command_to_be_executed with command
            )
            # fetching command id for the output
            command_id = response["Command"]["CommandId"]

            time.sleep(3)

            # fetching command output
            output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
            print(output)
            
        except Exception as e:
            logger.error(e)
            raise e
        
        
  

CodePudding user response:

There are several ways. One would be to setup s3 notifications to invoke a lambda function. Then lambda function would use SSM Run Command to execute AWS CLI S3 command on your instance to download the file from S3.

CodePudding user response:

I don't know why there is any recommendation of Lambda here. What you need is simple: S3 object created event notification -> SQS and some job on your EC2 instance watching a long polling queue.

Here is an example of such a python script. You need to sort out how the object key is encoded in the event, but it will be there. I haven't tested this, but it should be pretty close.

import boto3


def main() -> None:
    s3 = boto3.client("s3")
    sqs = boto3.client("sqs")
    while True:
        res = sqs.receive_message(
            QueueUrl="yourQueue",
            WaitTimeSeconds=20,
        )
        for msg in res.get("Messages", []):
            s3.download_file("yourBucket", msg["key"], "local/file/path")


if __name__ == "__main__":
    main()

CodePudding user response:

You can use S3 Event Notifications, which react to a new file coming into the s3 bucket. The destinations supported by s3 event are SNS, SQS or AWS lambda.

You can directly use the lambda as destination as described by @Marcin

You can use SQS has queue with a lambda behind pulling from the queue. It allows you to have some capability like dead letter queue. You can then pull messages from the queue using different methods:

  • AWS CLI
  • AWS SDK

You can use SNS with different things behind (you can have many of these desinations in a row which symbolise the fan-out pattern:

  • a SQS queue to manage the files
  • an email to notify
  • a lambda function
  • ...

You can find more explication in ths article: https://aws.plainenglish.io/system-design-s3-events-to-lambda-vs-s3-events-to-sqs-sns-to-lambda-2d41477d1cc9

  • Related