Home > Back-end >  compress .txt file on s3 location to .gz file
compress .txt file on s3 location to .gz file

Time:02-28

I need to compress a .txt- file to .gz which is on S3 location and then upload it to a different S3 bucket. I have written following code but its not working as expected:

def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):

    with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
        shutil.copyfileobj(fp, gz)
    compressed_fp.seek(0)
    print(compressed_fp)

    bucket.upload_fileobj(
        compressed_fp,
        key,
        {'ContentType': content_type, 'ContentEncoding': 'gzip'})

source_bucket = event['Records'][0]['s3']['bucket']['name']
file_key_name = event['Records'][0]['s3']['object']['key']

response = s3.get_object(Bucket=source_bucket, Key=file_key_name)
original = BytesIO(response['Body'].read())
original.seek(0)

upload_gzipped(source_bucket, file_key_name, original)

Can someone please help here, or any other approach to gzip the file on S3 location

CodePudding user response:

It would appear that you are writing an AWS Lambda function.

A simpler program flow would probably be:

  • Download the file to /tmp/ using s3_client.download_file()
  • Gzip the file
  • Upload the file to S3 using s3.client_upload_file()
  • Delete the files in /tmp/

Also, please note that the AWS Lambda function might be invoked with multiple objects being passed via the event. However, your code is currently only processing the first record with event['Records'][0]. The program should loop through these records like this:

for record in event['Records']:

    source_bucket = record['s3']['bucket']['name']
    file_key_name = record['s3']['object']['key']
    ...
  • Related