Python rookie here. I have a requirement for which i have been researching for a couple of days now. The requirement goes as below.
I have an S3 location where I have few excel sheets with unformatted data. I am writing a lambda function to format and convert them to csv format. Now I already have the code for this, but it works on local machine where I pick the excel files from local directory, format/transform them and put them to target folder. We are using openpyxl package for transforming. Now I am migrating this to AWS and there comes the problem. Instead of local directories the source and target will be s3 locations.
The data transforming logic is way too lengthy and I really dont want to rewrite them.
Is there a way I can handle these excel files just like how we does in local machine.
For instance,
wb = openpyxl.load_workbook('C:\User\test.xlsx, data_only=True)
How can I recreate this statement or what it does in lambda with python?
CodePudding user response:
You can do this with BytesIO like so:
file = readS3('test.xlsx') # load file with Boto3
wb = openpyxl.load_workbook(BytesIO(file), data_only=True)
With readS3()
being implemented for example like this:
import boto3
bucket = #bucket name
def readS3(file):
s3 = boto3.client('s3')
s3_data = s3.get_object(Bucket=bucket, Key=file)
return s3_data['Body'].read()
Configure Boto3 like so: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html