Nikhil Araga is working as a DevOps Engineer who works with Cloud Migrations & DevOps team. He is proud to be part of a team who religiously follow Agile and has a SHIFT-LEFT mindset. He is actively working to create multiple CI/CD pipelines & stable deployment patterns. His key areas of interests include, Cloud Security, FinOps and DevOps practices.
This article provides a detailed overview of invoking AWS Lambda using S3 events, and also highlights a few of the use cases in general.
What are Events in S3?
AWS launched a feature to receive notifications when something happens in your S3 bucket. You can enable these notifications by adding an event configuration that analyzes the event and then can perform certain actions wrt. that event.
We will use an example with a simple PUT based event and trigger an AWS Lambda function. Let’s get started…
Invoking Lambda using S3 Events
Create a new S3 bucket (or) navigate to one of the existing S3 buckets.
Navigate to Properties Tab >> Event Notifications. You can enable/create new event-based notifications here.
Click on Create new notifications.
Provide a name for the event to be created. As of now, we are not using any prefix(but you can provide one which can only considers events added or deleted after that prefix)
Also, provide an extension if you need one, We are not using anything, that means, the events are processed to all the objects.
Select the event types, We are checking All object create events. You can select other events if necessary.
Chose Lambda functionas the destination. There are options to use SNS Topic & SQS queue(We will discuss about this in different post)
Now the moment of truth, You can choose from your lambda functions or specify the Lambda function ARN.
If you didn’t create any Lambda function, no worries; open the Lambda service in a new tab.
Create a new function (We are using Python 3.8 as runtime, Select all default configurations & then create lambda function)
The python code described below just prints & returns the event. Nothing fancy here!
Update your Lambda handler code to include the following snippet.
Again back to the S3 console and you can find the newly created Lambda function over there (or) you can copy the newly created lambda’s ARN.
Voila, You’ve set all the things. Now let’s test the waters.
Upload a file to your bucket.
Go back to the Lambda function console. Navigate to the monitor tab >> View Logs in CloudWatch. You can see a Log stream over there with the name of your Lambda function.
Open the log stream, and you can see output populated over there.
Expanding the JSON data, we can see some key things over here. We can see all the data about the object that is being uploaded to the S3 bucket. This is just a starting point & can be further enhanced to multiple use cases.
Normally, S3 events are invoking Lambda function & the event message acts as the argument.
If S3 notifications for Lambda functions are set via the console, the console will configure the necessary permissions on Lambda to get invoked by S3 bucket.
As per AWS, S3 event notifications take around a few seconds to a minute to get delivered.
AWS guarantees delivery of the event al-least once. There are very few to no occurrences of events getting delivered twice.
Only a single event is delivered if multiple writes are made at the same time to a single non-versioned object.
It’s recommended to enable versioning to avoid the above scenario as versioning creates unique writes for the object and event notifications will be sent accordingly.
This article provides basic details about how to get started with S3 event notifications and triggering lambda. This workflow can be used for multiple use cases like automation, mobile & serverless apps, etc. These event triggers can greatly influence the automation on top of AWS as well. The flexibility of S3 & support of diverse programming languages on Lambda can be the best fit for driving out diverse automation scenarios on top of AWS. Make sure to play around with S3 event notifications. We will discuss more S3 events in upcoming blog posts. If you are new to AWS Lambda check out how AWS Lambda Invoke works.
This article provides a detailed overview of best-practices for AWS S3 on how to improve performance, also highlights few of the use-cases in general.
AWS S3 Design Patterns for Optimizing Performance
AWS S3 is widely adopted by majority of the users whose day to day interactions revolve around storage, cloud operations & deployments. AWS S3 offers high scalability. Uploading & downloading objects is easy as S3 supports a very high transaction rate (few hundreds to thousand transactions/second). As there are no limits on how to use S3, It’s better to understand the nature of S3 service and make your applications suitable to achieve great performance.
Let us go through the guidelines for optimizing performance and best-practices for designing your applications that are using S3.
Horizontal Storage Connections
AWS S3 is not a typical storage server, it identifies itself as massive distributed system. AWS suggests to leverage multiple concurrent requests to Amazon S3 while it can be uploading (or) retrieving etc.
It’s best recommended to spread all the requests over different connections to maximize the available bandwidth from AWS S3.
There are practically no limits for the number of connections made to your S3 bucket.
Use Range HTTP header in GET request, this allows user to fetch a specific byte-range from any object & transmitting only the required portion.
Concurrent connections can be used to fetch different byte ranges from within the same object from Amazon S3. This improves the overall throughput when compared with single whole object request.
8-MB (or) 16-MB can be best fit sizes for byte-range requests.
Leverage S3 Transfer Acceleration
It’s advisable to use S3 Transfer Acceleration(S3-TA) to transfer files securely over long geographic distances between the client & an S3 bucket.
S3-TA uses AWS CloudFront global edge locations. When data hits an edge-location, AWS will use an optimized n/w path for routing to its respective source.
When transferring GBs/TBs of data globally, it’s recommended to leverage S3-TA.
A separate S3-TA can also be used depending on your data origin. Other conditions like n/w configs, routing etc. will be varied from different time zones & locations.
The users are only charged for data-transfers if S3-TA improved their upload performance.
Latest Version of the AWS SDKs
It’s recommended to choose the updated/latest SDKs which can take advantage of latest best practices imbedded in them.
For Ex, most of the active SDKs include default-logic to automatically retry requests on 5XX errors & are actively contributing code to support & respond to slow (or) delayed connections.
Leverage HTTP REST API requests to optimize performance by following multiple connections to allow fetching of object data in parallel, retires on slow requests etc.
Using Caching for Frequently Accessed Content
S3 is actively used as a centralized datastore where a “working set” of data is repeatedly requested by users. Other AWS services like CloudFront, ElastiCache & Elemental-MediaStore can assist users in optimizing performance by caching data based on requests.
Adopting & implementing caching practices in architecture can reduce latency and improve data-transfer rates. Apps that are built using cache mechanisms will send only few requests when necessary and greatly reduces costs as well.
AWS Cloudfront caches data over at edge-locations all over the world. It will be a great advantage while using S3 with Cloudfront.
AWS ElastiCache is a managed, in-memory cache. This provisions EC2 instances in the specified region that will cache objects in its memory. This improves latency with GET requests & improves overall throughput.
AWS Elemental MediaStore is similar to Cloudfront in few ways but solely focused to assist video & media driven apps to imporve their workflows using S3.
Timeouts and Retries for Latency-Sensitive Applications
S3 is designed to be intelligent enough to scale itself when necessary. But there are few scenarios user receives temporary HTTP 5XX request responses. The possible cause here is , S3 is working internally to optimize for a different request rate.
S3 advises tracking and aggressively retrying slower operations for latency-sensitive applications.
Stable & consistent response times are typically provided for every fixed-size requests. I you feel something’s wrong,
S3 provides more consistent response times for each of the fixed-size requests. If seeing abnormal behavior, find the slowest 1% of requests and then retry them. Sometimes, even a single retry is greatly efficient at reducing latency.
Generally when using a massive requests, AWS suggests to check how much throughput is being achieved & retry the slowest requests. Try a retry after couple of seconds & other after few more seconds.
Previously, random prefix names are used to improve S3 performance and it was suggested by AWS itself. Now, its no longer necessary and can use a simple date-based naming patterns. These recommendations changes with time as AWS team is working on improving S3 along the way. But, As of now, the above patterns discussed can be a great start for optimizing S3 to improve performance.