I'm trying to develop a simple lambda function that will scrape a pdf and save it to an s3 bucket given the url and the desired filename as input data. I keep receiving the error "Read-only file system,' and I'm not sure if I have to change the bucket permissions or if there is something else I am missing. I am new to S3 and Lambda and would appreciate any help.
This is my code:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename'] ".pdf"
response = urllib.request.urlopen(url)
file = open(filename, 'w')
file.write(response.read())
s3.upload_fileobj(response.read(), 'sasbreports', filename)
file.close()
This was my event file:
{
"url": "https://purpose-cms-preprod01.s3.amazonaws.com/wp-content/uploads/2022/03/09205150/FY21-NIKE-Impact-Report_SASB-Summary.pdf",
"filename": "nike"
}
When I tested the function, I received this error:
{
"errorMessage": "[Errno 30] Read-only file system: 'nike.pdf.pdf'",
"errorType": "OSError",
"requestId": "de0b23d3-1e62-482c-bdf8-e27e82251941",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 15, in lambda_handler\n file = open(filename \".pdf\", 'w')\n"
]
}
CodePudding user response:
AWS Lambda functions can only write to the /tmp/
directory. All other directories are Read-Only.
Also, there is a default limit of 512MB for storage in /tmp/
, so make sure you delete the files after upload it to S3 for situations where the Lambda environment is re-used for future executions.
CodePudding user response:
AWS Lambda has limited space in /tmp
, the sole writable location.
Writing into this space can be dangerous without a proper disk management since this storage is kept alive across multiple executions. It can lead to a saturation or unexpected file share with previous requests.
Instead of saving locally the PDF, write it directly to S3, without involving file system this way:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename']
response = urllib.request.urlopen(url)
s3.upload_fileobj(response.read(), 'sasbreports', filename)
BTW: The .pdf
appending should be removed according your use case.