How Python AWS Lambda interacts specifically with the uploaded file?-CodePudding

I´m trying to do the following: when I upload a file in my s3 storage, the lambda picks this json file and converts it into a csv file.

How can I specify in the lambda code which file must pick?

example of my code in local:

import pandas as pd

df = pd.read_json('movies.json')
df.to_csv('csv-movies.csv')

in this example, I provide the name of the file...but..how can I manage that on a Lambda?

I think I don´t understand how Lambda works...could you give me an example?

CodePudding user response：

Lambda spins up execution environments to handle your requests. When it initialises these environments, it'll pull the code you uploaded, and execute it when invoked.

Execution environments have a concept of ephemeral (temporary) storage with a default size of 512mb.

Lambda doesn't have access to your files in S3 by default. You'd first need to download your file from S3 using something like the AWS SDK for Python. You can store it in the /tmp directory to make use of the ephemeral storage I mentioned earlier.

Once you've downloaded the file using the SDK, you can interact with it as you would if you were running this locally, like in your example.

On the flip side, you'd also need to use the SDK to upload the CSV back to S3 if you want to keep it beyond the lifecycle of that execution environment.

Something else you might want to explore in future is reading that file into memory and doing away with storing it in ephemeral storage altogether.

CodePudding user response：

In order to achieve this you will need to use S3 as the event source for your Lambda, there's a useful tutorial for this provided by AWS themselves and has some sample python code to assist you, you can view it here.

To break it down slightly further and answer how you get the name of the file. The lambda handler will look similar to the following:

def lambda_handler(event, context)

What is important here is the event object. When your event source is the S3 bucket you will be given the name of the bucket and the s3 key in the object which is effectively the path to the file in the S3 bucket. With this information you can do some logic to decide if you want to download the file from that path. If you do, you can use the S3 get_object( ) api call as shown in the tutorial.

Once this file is downloaded it can be used like any other file you would have on your local machine, so you can then proceed to process the json to a CSV. Once it is converted you will presumably want to put it back in S3 and for this you can use the S3 put_object( ) call for this and reuse the information in the event object in order to specify the path.