86
Crude Python tree shaking for squeezing into AWS Lambda package size limits
I have been working on a service that manages a number of Machine Learning workloads that is deployed to AWS Lambda. One of the issues I encountered deploying this service is our dependency on a number of popular ML related Python modules such as pandas
and scikit-learn
and exceeding the maximum size thresholds imposed by AWS. AWS enforces a 250 MB hard limit on Lambda packages and scikit-learn
alone is 100 MB unzipped. Deploying yields:
ServerlessError: An error occurred: Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400)"
There doesn't seem to be many options for slimming down dependencies, I was hoping to find subtree splits of the submodules that were relevant to our project, but came up short. I had already configured the serverless-python-requirements
plugin to slim down packages:
custom:
pythonRequirements:
slim: true
strip: false
Note: the strip: false
option was required to prevent an error during execution which manifested as:
Runtime.ImportModuleError: Unable to import module 'handler': /var/task/scipy/linalg/_fblas.cpython-38-x86_64-linux-gnu.so: ELF load command address/offset not properly aligned
One option to reduce the footprint of dependencies is to implement some form of tree shaking, to remove chunks of code we're not actually using, the tricky part would be to identify which parts those are. In our case, we had already 100% test coverage, so we were able to use this coverage and run a report against our dependency folder (instead of our src folder) to find out which parts of our dependencies we were actually using:
poetry run pytest --cov-report=html\
--cov=/path/to/site-packages
Running this report indicates about %6 "coverage" of our dependencies:
From here, it was a simple matter of taking a snapshot of files with 0% coverage and removing them from our artefact. To accomplish this, I found the serverless-scriptable-plugin
was required to remove dependencies added by the serverless-python-requirements
plugin.
plugins:
- serverless-python-requirements
- serverless-scriptable-plugin
...
custom:
pythonRequirements:
slim: true
strip: false
scriptable:
hooks:
after:package:createDeploymentArtifacts:
- ./shake.sh
#/bin/sh
filelist="_distutils_hack/__init__.py
_distutils_hack/override.py
...
wheel/util.py
wheel/vendored/packaging/_typing.py
wheel/vendored/packaging/tags.py
wheel/wheelfile.py"
for file in $filelist
do
zip --delete ./.serverless/data-science-api.zip $file
done
In the end 28,000 unnecessary files were removed and the artefact was reduced from 307 MB to 236 MB, unblocking deployment.
86