How AWS EC2 Auto Scaling works?

AWS EC2 auto scaling encapsulates the difference between the public cloud and old-school data centers. With AWS EC2 auto scaling, you can ensure that there is an exact match between demand and supply, in other words, that nothing is wasted. This is a massive contrast to old-school data centers, which had to be built to cope with maximum demand even if this means that most of the capacity is sitting idle most of the time.

AWS EC2 auto scaling is based on defined groups

In keeping with the principle of creating failure-proof architecture, most processes are now served by multiple instances (with similar, or even identical, characteristics) instead of one, robust instance. These instances can be collected into a single EC2 auto scaling group so that they are managed coherently. 

AWS EC2 auto scaling monitors the health of all instances in an auto scaling group

In order to ensure optimal performance, auto scaling performs regular health-checks on all the instances in any given AWS EC2 auto scaling group. If it identifies that an instance is unhealthy, it automatically schedules that instance to be replaced by a new instance.

AWS EC2 auto scaling is free

The AWS EC2 auto scaling service itself is free to use. Obviously you pay for any extra AWS EC2 instances it spins up on your behalf, but then you stop paying when it shuts them down. 

Setting up an AWS EC2 auto scaling group

aws ec2 auto scaling

aws ec2 auto scaling

There are currently four ways to apply auto scaling to any given AWS EC2 auto scaling group.

Manual – You modify the desired instance capacity and the instance count is updated accordingly.

Scheduled – You can schedule according to time, day of the week or date. This option is typically used for predictable workloads. For example, if an office works 9-5 Monday to Friday, then the vast bulk of its capacity is going to be needed during this time and hence it can be literally scaled-down outside of these hours.

Dynamic – Dynamic scheduling is really taking advantage of the cloud’s capabilities. Essentially you set policies that define how your instances should be auto-scaled and you leave the cloud to manage the actual work.

Predictive – This option forecasts what capacity you are going to need and proactively schedules capacity to be available when it is needed (or canceled when it is not). You can either use the forecasts to manage manual or scheduled auto scaling (Forecast only) or combine predictive auto scaling with dynamic auto scaling (Forecast and scale) and basically have AWS EC2 manage everything for you (although, as always, it’s a good idea to double-check its effectiveness periodically).

At the risk of stating the obvious, the effectiveness of predictive scaling will depend largely on the predictability of your behavior. Predictive auto scaling is backed by effective Amazon Machine Learning modules, which are now capable of performing a very robust analysis of your daily and weekly usage patterns, but that will be of little help if your usage patterns are continually changing.

Dynamic scaling policies

If you want to use dynamic scaling then you need to set the conditions under which it will operate. In addition to the sort of parameters you would probably expect, such as CPU, memory and network utilization, SQS queue size and number of sessions, you can also define custom parameters to suit your particular field of operation.

For example, let’s say you run a website for a fast-food outlet, which has a takeout service. You know most of your customers are local and that when the weather is good, they will generally come down to the restaurant themselves and either eat in or pick up their food to take out. On the other hand, as soon as the weather turns bad, they go online and order delivery. You could therefore integrate with a weather-forecast platform to update capacity in line with the latest weather predictions.

Scaling policy type

Once you have defined your scaling policy, you will need to define a type for it. The main options are:

ChangeInCapacity – you specify the exact number of instances to spin up or down each time there is an auto scaling incident.

ExactCapacity – you set a target instance count and auto scaling will aim to match it.

PercentageChangeInCapacity – you specify the percentage by which to change the number of instances attached to an auto scaling group.

Load balancing

Once you have your AWS EC2 auto scaling group set up to your liking, your last task is to connect it to a load balancer. If you enable Auto Scaling with Elastic Load Balancing, then any instances spun up by AWS EC2 auto scaling will be automatically registered with the load balancer and likewise any instances shut down by AWS EC2 auto scaling will be automatically deregistered. While the instances are running, the load balancer will manage the traffic between them.

read about AWS EC2 spot instances