Autoscaling Workloads (2024)

With autoscaling, you can automatically update your workloads in one way or another. This allows your cluster to react to changes in resource demand more elastically and efficiently.

In Kubernetes, you can scale a workload depending on the current demand of resources.This allows your cluster to react to changes in resource demand more elastically and efficiently.

When you scale a workload, you can either increase or decrease the number of replicas managed bythe workload, or adjust the resources available to the replicas in-place.

The first approach is referred to as horizontal scaling, while the second is referred to asvertical scaling.

There are manual and automatic ways to scale your workloads, depending on your use case.

Scaling workloads manually

Kubernetes supports manual scaling of workloads. Horizontal scaling can be doneusing the kubectl CLI.For vertical scaling, you need to patch the resource definition of your workload.

See below for examples of both strategies.

Horizontal scaling: Running multiple instances of your app
Vertical scaling: Resizing CPU and memory resources assigned to containers

Scaling workloads automatically

Kubernetes also supports automatic scaling of workloads, which is the focus of this page.

Scaling workloads horizontally

In Kubernetes, you can automatically scale a workload horizontally using a HorizontalPodAutoscaler (HPA).

It is implemented as a Kubernetes API resource and a controllerand periodically adjusts the number of replicasin a workload to match observed resource utilization such as CPU or memory usage.

There is a walkthrough tutorial of configuring a HorizontalPodAutoscaler for a Deployment.

Scaling workloads vertically

FEATURE STATE: Kubernetes v1.25 [stable]

You can automatically scale a workload vertically using a VerticalPodAutoscaler (VPA).Unlike the HPA, the VPA doesn't come with Kubernetes by default, but is a separate projectthat can be found on GitHub.

Once installed, it allows you to create CustomResourceDefinitions(CRDs) for your workloads which define how and when to scale the resources of the managed replicas.

Note:

You will need to have the Metrics Serverinstalled to your cluster for the HPA to work.

Mode	Description
`Auto`	Currently, `Recreate` might change to in-place updates in the future
`Recreate`	The VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation
`Initial`	The VPA only assigns resource requests on pod creation and never changes them later.
`Off`	The VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Requirements for in-place resizing

FEATURE STATE: Kubernetes v1.27 [alpha]

Resizing a workload in-place without restarting the Podsor its Containers requires Kubernetes version 1.27 or later.Additionally, the InPlaceVerticalScaling feature gate needs to be enabled.

InPlacePodVerticalScaling: Enables in-place Pod vertical scaling.

Autoscaling based on cluster size

For workloads that need to be scaled based on the size of the cluster (for examplecluster-dns or other system components), you can use theCluster Proportional Autoscaler.Just like the VPA, it is not part of the Kubernetes core, but hosted as itsown project on GitHub.

The Cluster Proportional Autoscaler watches the number of schedulable nodesand cores and scales the number of replicas of the target workload accordingly.

If the number of replicas should stay the same, you can scale your workloads vertically according to the cluster size usingthe Cluster Proportional Vertical Autoscaler.The project is currently in beta and can be found on GitHub.

While the Cluster Proportional Autoscaler scales the number of replicas of a workload, the Cluster Proportional Vertical Autoscaleradjusts the resource requests for a workload (for example a Deployment or DaemonSet) based on the number of nodes and/or coresin the cluster.

Event driven Autoscaling

It is also possible to scale workloads based on events, for example using theKubernetes Event Driven Autoscaler (KEDA).

KEDA is a CNCF graduated enabling you to scale your workloads based on the numberof events to be processed, for example the amount of messages in a queue. There existsa wide range of adapters for different event sources to choose from.

Autoscaling based on schedules

Another strategy for scaling your workloads is to schedule the scaling operations, for example in order toreduce resource consumption during off-peak hours.

Similar to event driven autoscaling, such behavior can be achieved using KEDA in conjunction withits Cron scaler. The Cron scaler allows you to define schedules(and time zones) for scaling your workloads in or out.

Scaling cluster infrastructure

If scaling workloads isn't enough to meet your needs, you can also scale your cluster infrastructure itself.

Scaling the cluster infrastructure normally means adding or removing nodes.Read cluster autoscalingfor more information.

What's next

Learn more about scaling horizontally
- Scale a StatefulSet
- HorizontalPodAutoscaler Walkthrough
Resize Container Resources In-Place
Autoscale the DNS Service in a Cluster
Learn about cluster autoscaling

FAQs

What are scale out workloads? ›

Scale out, meanwhile, uses a clustering approach, and by bundling different nodes together, you can distribute the workload and so you can run data through multiple channels. Scale out is also known as a horizontal system.

Read On ›

What is an example of auto scaling? ›

An instance is a single server or machine that is subject to auto scaling rules created for a group of machines. The group itself is an auto scaling group, with each instance in the group subject to those auto scaling policies. For example, the Elastic Compute Cloud (EC2) is the compute platform of the AWS ecosystem.

Discover More Details ›

What are the two types of autoscaling? ›

There are two primary types of auto-scaling: scaling up and scaling out. Scaling up, or vertical scaling, involves increasing the capacity of a single server or instance. This auto-scaling approach focuses on enhancing the performance and capabilities of an individual resource.

What is the difference between autoscaling and load balancer? ›

While load balancing will re-route connections from unhealthy instances, it still needs new instances to route connections to. Thus, auto scaling will initiate these new instances, and your load balancing will attach connections to them.

See Details ›

What is workload scaling? ›

Scaling workloads horizontally

It is implemented as a Kubernetes API resource and a controller and periodically adjusts the number of replicas in a workload to match observed resource utilization such as CPU or memory usage. There is a walkthrough tutorial of configuring a HorizontalPodAutoscaler for a Deployment.

Find Out More ›

What are the different workloads? ›

Some key types of workloads include:

Transactional workloads. ...
Batch workloads. ...
Analytical workloads. ...
Database workloads. ...
High-performance computing (HPC) workloads. ...
Test and dev workloads. ...
Real-time workloads. ...
Hybrid workloads.

Tell Me More ›

What is the difference between auto scaling and autoscaling? ›

In conclusion, both services are powerful tools for scaling resources in the cloud, but they have key differences in their capabilities and configurations. EC2 Auto Scaling is more focused on scaling EC2 instances within an Auto Scaling group, while AWS AutoScaling can scale a wider range of resources and workloads.

Show Me More ›

What is meant by autoscaling? ›

Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm.

Explore More ›

When to use autoscaling? ›

You should use AWS Auto Scaling if you have an application that uses one or more scalable resources and experiences variable load. A good example would be an e-commerce web application that receives variable traffic through the day.

What triggers autoscaling? ›

The Auto Scaling group in your Elastic Beanstalk environment uses two Amazon CloudWatch alarms to trigger scaling operations. The default triggers scale when the average outbound network traffic from each instance is higher than 6 MB or lower than 2 MB over a period of five minutes.

Show Me More ›

What is the disadvantage of autoscaling? ›

Performance Degradation: Autoscaling can cause performance degradation when new instances are started, as they consume time to initialize and stabilize. This can cause significant performance issues during peak demand, which can be detrimental to the end-user experience.

Read The Full Story ›

What is the primary goal of autoscaling? ›

AWS Auto Scaling continually monitors your applications to make sure that they are operating at your desired performance levels. When demand spikes, AWS Auto Scaling automatically increases the capacity of constrained resources so you maintain a high quality of service.

See Details ›

Can Auto Scaling work without load balancer? ›

Q: Can I use Amazon EC2 Auto Scaling for health checks and to replace unhealthy instances if I'm not using Elastic Load Balancing (ELB)? You don't have to use ELB to use Auto Scaling.

Get More Info Here ›

How do I know if my Auto Scaling is working? ›

Select the check box next to the Auto Scaling group. A split pane opens up in the bottom of the page. On the Activity tab, under Activity history, the Status column shows whether your Auto Scaling group has successfully launched or terminated instances, or whether the scaling activity is still in progress.

How do load balancing and Auto Scaling work together? ›

Here's how they work together: Load Balancer and Auto Scaling Groups: Auto scaling groups work in tandem with load balancers to automatically adjust the number of instances based on demand. Instances added or removed by auto scaling are registered or deregistered from the load balancer to maintain even distribution.

What does it mean to scale out? ›

What Is to Scale Out? To scale out is the process of selling off portions of total shares held while the price increases. To scale out, or scaling out, means to exit a position by selling in increments as the price of the stock climbs.

View Details ›

What is the concept of scale out? ›

App developers start to consider scaling out or horizontal scaling when they can't get enough resources for their workloads, even operating on the highest performance levels. With horizontal scaling, data is split into several databases, or shards, across servers, and each shard can be scaled up or down independently.

What is the difference between scaleout and scale up? ›

Scaling up vertically means adding more compute resources—such as CPU, memory, and disk capacity—to an application pod. On the other hand, applications can scale out horizontally by adding more replica pods.

Learn More ›

What does scale out deployment mean? ›

Scale-out deployments are used to increase scalability of report servers to handle more concurrent users and larger report execution loads. It can also be used to dedicate specific servers to process interactive or scheduled reports.

Discover More Details ›