AWS officially introduced Amazon S3 Object Lambda.
What is it?
“A new capability that allows you to add your own code to process data retrieved from S3 before returning it to an application. S3 Object Lambda works with your existing applications and uses AWS Lambda functions to automatically process and transform your data as it is being retrieved from S3. The Lambda function is invoked inline with a standard S3 GET request, so you don’t need to change your application code. In this way, you can easily present multiple views from the same dataset, and you can update the Lambda functions to modify these views at any time.”
Danilo Poccia
AWS S3 Object Lambda allows you to pipe a file through a Lambda invocation on read. This means that your applications (or customers) will read that file through a lens that you can define.
Why does S3 Object Lambda matter?
S3 Object Lambda is fantastic for a specific set of use cases. However, just because it’s there it doesn’t mean you’ll need it. In fact, in our early work with the service, you may not.
When is it worth executing an S3 Object Lambda invocation for each reader vs. saving a copy of the file?
Let’s look at some standard use cases where S3 Object Lamba will shine:
- DRM. You have many readers, interested in small variations of the same thing.
- GFTS Traffic Feeds. You have highly dynamic data. By the time you make enough copies for all your read cases, they will be old.
- Variable Format File Types: You infrequently need to generate different formats of different file types (e.g. pdf/ebook/mobi/xml).
- Stripping PII from Audit Logs. The amount of data you need to read, and process is significantly lower than the amount you have to write.
Conversely, here are a few (also standard) use cases where S3 Object Lamba may not be ideal:
- File Interaction with CLI. High-level, S3 CLI commands are not supported in S3 Object Lambda. For example, your flow relies on aws s3 cp s3://bucket/thing
- Database Replacement. You can undoubtedly run selects and filters on the file content, but we do not recommend it. Consider a database instead, or a different file partition strategy to optimize performance and minimize cost.
- Serving Website Images or Thumbnails. The best practice for these assets is to build them once for everybody. Even if you are building a simple site like placekitten.com, please resist the temptation. The chances are very high that you will end up investing more than you gain in this use case. If you need a solution here, we recommend the use of CacheFly or CloudFront Lambda@Edge
- Low Read Cardinality. In environments where there are many reads for the same asset (e.g. your company’s internal benefits documentation), we recommend the use of CacheFly or CloudFront Lambda@Edge
- Frequently Requested / Long Lifetime (TTL) Files. Great examples of this use case are Docusign documents that you want to be able to reference for a very long time.
- Machine Learning Data Pipelines. For myriad reasons, we don’t recommend using S3 Object Lambda to filter fields or for field-level access control for machine learning data pipelines.
- Cache. Whenever using a cache would make sense, use a cache!
With any new AWS service — or any cloud service for that matter — there’s always a period of rapid evolution immediately after launch. We fully expect to push the edges of S3 Object Lambda as far as we can and determine where the ideal use cases lie for our clients.
Need guidance? Talk to our team!