I have code that I use to extract a csv that's inside a zip folder with nested folders. This code works great when the zip file is local but we are moving the files to our S3 bucket. I'd like to apply my existing code within an AWS lambda function however I'm unsure how to access the zip file from S3. This code was originally designed to operate on multiple zip files hence the looping, in this new case it will only be a single zip file for but simplicities sake I've just copied existing code with loop.
Code snippet with missing code below:
all_files = NEED CODE THAT GRABS ZIP FILE STORED ON S3
for file in all_files:
with ZipFile(file) as zip:
for zip_info in zip.infolist():
if zip_info.filename[-1] == '/':
continue
zip_info.filename = os.path.basename(zip_info.filename)
master_df = pd.read_csv(zip.open(zip_info))
master_df = pd.concat((pd.read_csv(f, encoding = 'ANSI',dtype='str') for f in all_files), ignore_index=True)
master_df.drop(master_df.columns[0],axis=1, inplace= True)
master_df.columns = [x.replace("\n", "") for x in
master_df.columns.str.strip().to_list()]
master_df.rename(columns = {'Date':'REPORT_PERIOD', 'Ad Impressions':'IMPRESSIONS', 'Publisher Currency Revenue':'REVENUE'}, inplace=True)
CodePudding user response:
The easiest method would be to use download_file()
to download the Zip file to the /tmp/
directory (which is the only location that is writeable in AWS Lambda functions).
For example:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('bucket-name', 'file.zip', '/tmp/file.zip')
Your code can then access the local /tmp/file.zip
file as normal.
Also, please note that the Lambda function environment might be reused for future function executions. Therefore, it is generally a good idea to delete any downloaded files otherwise it might consume all 512MB of storage available.