I am trying to create a Lambda function which will clean automatically csv files from an S3 bucket. The S3 bucket receives files every 5mn, and I have therefore created a trigger for the Lambda function. To clean the csv files I will use pandas library to create a dataframe. I have already installed a pandas layer. When creating a dataframe, there is an error message. This is my code:
import json
import boto3
import pandas as pd
from io import StringIO
#call s3 bucket
client = boto3.client('s3')
def lambda_handler(event, context):
#define bucket_name and object _name
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_name = event['Records'][0]['s3']['object']['key']
#create a df from the object
df = pd.read_csv(object_name)
This is the error message:
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: 'object_name'
On Cloudwatch it additionally says:
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
Has anyone experienced the same issues? Thanks in advance for all your help!
CodePudding user response:
You have to use the s3 client to download the file from s3 before using pandas. Something like:
response = client.get_object(Bucket=bucket_name, Key=object_name)
df = pd.read_csv(response["Body"])
You'll have to make sure lambda has the right permissions to access the s3 bucket.
CodePudding user response:
Change this line:
df = pd.read_csv("object_name")
to this:
df = pd.read_csv(object_name)
CodePudding user response:
Cause of error
object_name
is just a relative path(key) of the s3 object with respect to bucket and it has no significance without the bucket_name
hence when you are trying to read the csv file you are getting FileNotFoundError
Solution for the error
In order to properly refer the s3 object you have to construct the fully qualified s3 path from bucket_name
and object_name
. Also notice that the object key has some quoted characters so before constructing the fully qualified path you have to unquote them.
from urllib.parse import unquote_plus
def lambda_handler(event, context):
#define bucket_name and object _name
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_name = event['Records'][0]['s3']['object']['key']
#create a df from the object
filepath = f's3://{bucket_name}/{unquote_plus(object_name)}'
df = pd.read_csv(filepath)