Using openpyxl with lambda-CodePudding

Python rookie here. I have a requirement for which i have been researching for a couple of days now. The requirement goes as below.

I have an S3 location where I have few excel sheets with unformatted data. I am writing a lambda function to format and convert them to csv format. Now I already have the code for this, but it works on local machine where I pick the excel files from local directory, format/transform them and put them to target folder. We are using openpyxl package for transforming. Now I am migrating this to AWS and there comes the problem. Instead of local directories the source and target will be s3 locations.

The data transforming logic is way too lengthy and I really dont want to rewrite them.

Is there a way I can handle these excel files just like how we does in local machine.

For instance,

wb = openpyxl.load_workbook('C:\User\test.xlsx, data_only=True)

How can I recreate this statement or what it does in lambda with python?

CodePudding user response：

You can do this with BytesIO like so:

file = readS3('test.xlsx') # load file with Boto3
wb = openpyxl.load_workbook(BytesIO(file), data_only=True)

With readS3() being implemented for example like this:

import boto3

bucket = #bucket name

def readS3(file):
    s3 = boto3.client('s3')
    s3_data = s3.get_object(Bucket=bucket, Key=file)
    return s3_data['Body'].read()

Configure Boto3 like so: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html