Home > Back-end >  Unable to copy object from bucket to another bucket in Lambda function
Unable to copy object from bucket to another bucket in Lambda function

Time:03-31

I have a Lambda function that I am using to replicate a certain file format based on a PUT event on a bucket to another bucket. No errors get thrown in the CloudWatch logs, but the code does not replicate the file. This is only happening on this key which is partitioned by the date.

Lambda Event

{
  "Records": [
    {
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "lasic2-artifacts",
        "bucket": {
          "name": "BUCKETNAME",
          "arn": "arn:aws:s3:::BUCKETNAME"
        },
        "object": {
          "key": "models/operatorai-model-store/lasic2/2022/03/08/10:21:05/artifacts.tar.gz"
        }
      }
    }
  ]
}

Lambda Function

import boto3
from botocore.exceptions import ClientError

print("Loading function")

s3 = boto3.client("s3", region_name="us-east-1")

class NoRecords(Exception):
    """
    Exception thrown when there are no records found from
    s3:ObjectCreatedPut
    """

def get_source(bucket, key):
    """
    Returns the source object to be passed when copying over the contents from
    bucket A to bucket B
    :param bucket: name of the bucket to copy the key to
    :param key: the path of the object to copy
    """
    return {
        "Bucket": bucket,
        "Key": key,
    }


def process_record(
    record,
    production_bucket,
    staging_bucket,
):
    """
    Process individual records(example record can be found here
    https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#test-manual-invoke)
    :param record: a record from s3:ObjectCreated:Put
    :param production_bucket: name of the production bucket which comes from
    the records
    :param staging_bucket: name of the staging bucket to save the key from the
    production_bucket into
    """
    key = record["s3"]["object"]["key"]
    print(f"Key: \n{key}")
    try:
        s3_response = s3.get_object(Bucket=production_bucket, Key=key)
        s3_object = s3_response["Body"].read()
        copy_source = get_source(bucket=production_bucket, key=key)
        s3.copy_object(
            Bucket=staging_bucket,
            Key=key,
            CopySource=copy_source,
            ACL="bucket-owner-full-control",
        )
    except ClientError as error:
        error_code = error.response["Error"]["Code"]
        error_message = error.response["Error"]["Message"]
        if error_code == "NoSuchBucket":
            print(error_message)
            raise
    except Exception as error:
        print(f"Failed to upload {key}")
        print(error)
        raise


def lambda_handler(event, _):
    print(f"Event: \n{event}")
    records = event["Records"]
    num_records = len(records)
    if num_records == 0:
        raise NoRecords("No records found")
    record = records[0]
    production_bucket = record["s3"]["bucket"]["name"]
    staging_bucket = f"{production_bucket}-staging"
    process_record(
        record=record,
        production_bucket=production_bucket,
        staging_bucket=staging_bucket,
    )

CodePudding user response:

The hint to the issue is in the event that you received:

"key": "models/operatorai-model-store/lasic2/2022/03/08/10:21:05/artifacts.tar.gz"

You can see the object key here is encoded. The documentation is explicit about this:

The s3 key provides information about the bucket and object involved in the event. The object key name value is URL encoded. For example, "red flower.jpg" becomes "red flower.jpg" (Amazon S3 returns "application/x-www-form-urlencoded" as the content type in the response).

Since all of the SDK APIs you can use in boto3 expect an unencoded string, you'll need to decode the object key as it comes in to the Lambda before using it:

import urllib.parse
# ....
    key = record["s3"]["object"]["key"]
    key = urllib.parse.unquote_plus(key)
    print(f"Key: \n{key}")
  • Related