I'm following this blog post to create a runtime environment using Docker for use with AWS Lambda. I'm creating a layer for using with Python 3.8:
docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"
And then archiving the layer as zip: zip -9 -r mylayer.zip python
All standard so far. The problem arises in the .zip
size, which is > 250mb and so creates the following error in Lambda: Failed to create layer version: Unzipped size must be smaller than 262144000 bytes
.
Here's my requirements.txt
:
s3fs
scrapy
pandas
requests
I'm including s3fs
since I get the following error when trying to save a parquet file to an S3 bucket using pandas: [ERROR] ImportError: Install s3fs to access S3
. This problem is that including s3fs
massively increases the layer size. Without s3fs
the layer is < 200mb unzipped.
My most direct question would be: How can I reduce the layer size to < 250mb while still using Docker and keeping s3fs
in my requirements.txt
? I can't explain the 50mb difference, especially since s3fs
< 100kb on PyPi.
Finally, for those questioning my use of Lambda with Scrapy: my scraper is trivial, and spinning up an EC2 instance would be overkill.
CodePudding user response:
The key idea behind shrinking your layers is to identify what pip
installs and what you can get rid off, usually manually.
In your case, since you are only slightly above the limit, I would get rid off pandas/tests
. So before you create your zip layer, you can run the following in the layer's folder (mylayer
from your past question):
rm -rvf python/lib/python3.8/site-packages/pandas/tests
This should trim your layer below the 262MB limit after unpacking. In my test it is now 244MB.
Alternatively, you can go over python
folder manually, and start removing any other tests, documentations, examples, etc, that are not needed.