Reducing Python zip size to use with AWS Lambda-CodePudding

I'm following this blog post to create a runtime environment using Docker for use with AWS Lambda. I'm creating a layer for using with Python 3.8:

docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"

And then archiving the layer as zip: zip -9 -r mylayer.zip python

All standard so far. The problem arises in the .zip size, which is > 250mb and so creates the following error in Lambda: Failed to create layer version: Unzipped size must be smaller than 262144000 bytes.

Here's my requirements.txt:

s3fs
scrapy
pandas
requests

I'm including s3fs since I get the following error when trying to save a parquet file to an S3 bucket using pandas: [ERROR] ImportError: Install s3fs to access S3. This problem is that including s3fs massively increases the layer size. Without s3fs the layer is < 200mb unzipped.

My most direct question would be: How can I reduce the layer size to < 250mb while still using Docker and keeping s3fs in my requirements.txt? I can't explain the 50mb difference, especially since s3fs < 100kb on PyPi.

Finally, for those questioning my use of Lambda with Scrapy: my scraper is trivial, and spinning up an EC2 instance would be overkill.

CodePudding user response：

The key idea behind shrinking your layers is to identify what pip installs and what you can get rid off, usually manually.

In your case, since you are only slightly above the limit, I would get rid off pandas/tests. So before you create your zip layer, you can run the following in the layer's folder (mylayer from your past question):

rm -rvf python/lib/python3.8/site-packages/pandas/tests

This should trim your layer below the 262MB limit after unpacking. In my test it is now 244MB.

Alternatively, you can go over python folder manually, and start removing any other tests, documentations, examples, etc, that are not needed.