Autoscaling Workloads (2024)

With autoscaling, you can automatically update your workloads in one way or another. This allows your cluster to react to changes in resource demand more elastically and efficiently.

In Kubernetes, you can scale a workload depending on the current demand of resources.This allows your cluster to react to changes in resource demand more elastically and efficiently.

When you scale a workload, you can either increase or decrease the number of replicas managed bythe workload, or adjust the resources available to the replicas in-place.

The first approach is referred to as horizontal scaling, while the second is referred to asvertical scaling.

There are manual and automatic ways to scale your workloads, depending on your use case.

Scaling workloads manually

Kubernetes supports manual scaling of workloads. Horizontal scaling can be doneusing the kubectl CLI.For vertical scaling, you need to patch the resource definition of your workload.

See below for examples of both strategies.

  • Horizontal scaling: Running multiple instances of your app
  • Vertical scaling: Resizing CPU and memory resources assigned to containers

Scaling workloads automatically

Kubernetes also supports automatic scaling of workloads, which is the focus of this page.

The concept of Autoscaling in Kubernetes refers to the ability to automatically update anobject that manages a set of Pods (for example aDeployment).

Scaling workloads horizontally

In Kubernetes, you can automatically scale a workload horizontally using a HorizontalPodAutoscaler (HPA).

It is implemented as a Kubernetes API resource and a controllerand periodically adjusts the number of replicasin a workload to match observed resource utilization such as CPU or memory usage.

There is a walkthrough tutorial of configuring a HorizontalPodAutoscaler for a Deployment.

Scaling workloads vertically

FEATURE STATE: Kubernetes v1.25 [stable]

You can automatically scale a workload vertically using a VerticalPodAutoscaler (VPA).Unlike the HPA, the VPA doesn't come with Kubernetes by default, but is a separate projectthat can be found on GitHub.

Once installed, it allows you to create CustomResourceDefinitions(CRDs) for your workloads which define how and when to scale the resources of the managed replicas.

Note:

You will need to have the Metrics Serverinstalled to your cluster for the HPA to work.

At the moment, the VPA can operate in four different modes:

ModeDescription
AutoCurrently, Recreate might change to in-place updates in the future
RecreateThe VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation
InitialThe VPA only assigns resource requests on pod creation and never changes them later.
OffThe VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Requirements for in-place resizing

FEATURE STATE: Kubernetes v1.27 [alpha]

Resizing a workload in-place without restarting the Podsor its Containers requires Kubernetes version 1.27 or later.Additionally, the InPlaceVerticalScaling feature gate needs to be enabled.

InPlacePodVerticalScaling: Enables in-place Pod vertical scaling.

Autoscaling based on cluster size

For workloads that need to be scaled based on the size of the cluster (for examplecluster-dns or other system components), you can use theCluster Proportional Autoscaler.Just like the VPA, it is not part of the Kubernetes core, but hosted as itsown project on GitHub.

The Cluster Proportional Autoscaler watches the number of schedulable nodesand cores and scales the number of replicas of the target workload accordingly.

If the number of replicas should stay the same, you can scale your workloads vertically according to the cluster size usingthe Cluster Proportional Vertical Autoscaler.The project is currently in beta and can be found on GitHub.

While the Cluster Proportional Autoscaler scales the number of replicas of a workload, the Cluster Proportional Vertical Autoscaleradjusts the resource requests for a workload (for example a Deployment or DaemonSet) based on the number of nodes and/or coresin the cluster.

Event driven Autoscaling

It is also possible to scale workloads based on events, for example using theKubernetes Event Driven Autoscaler (KEDA).

KEDA is a CNCF graduated enabling you to scale your workloads based on the numberof events to be processed, for example the amount of messages in a queue. There existsa wide range of adapters for different event sources to choose from.

Autoscaling based on schedules

Another strategy for scaling your workloads is to schedule the scaling operations, for example in order toreduce resource consumption during off-peak hours.

Similar to event driven autoscaling, such behavior can be achieved using KEDA in conjunction withits Cron scaler. The Cron scaler allows you to define schedules(and time zones) for scaling your workloads in or out.

Scaling cluster infrastructure

If scaling workloads isn't enough to meet your needs, you can also scale your cluster infrastructure itself.

Scaling the cluster infrastructure normally means adding or removing nodes.Read cluster autoscalingfor more information.

What's next

  • Learn more about scaling horizontally
    • Scale a StatefulSet
    • HorizontalPodAutoscaler Walkthrough
  • Resize Container Resources In-Place
  • Autoscale the DNS Service in a Cluster
  • Learn about cluster autoscaling
Autoscaling Workloads (2024)

FAQs

What are scale out workloads? ›

Scale out, meanwhile, uses a clustering approach, and by bundling different nodes together, you can distribute the workload and so you can run data through multiple channels. Scale out is also known as a horizontal system.

What is an example of auto scaling? ›

An instance is a single server or machine that is subject to auto scaling rules created for a group of machines. The group itself is an auto scaling group, with each instance in the group subject to those auto scaling policies. For example, the Elastic Compute Cloud (EC2) is the compute platform of the AWS ecosystem.

What are the two types of autoscaling? ›

There are two primary types of auto-scaling: scaling up and scaling out. Scaling up, or vertical scaling, involves increasing the capacity of a single server or instance. This auto-scaling approach focuses on enhancing the performance and capabilities of an individual resource.

What is the difference between autoscaling and load balancer? ›

While load balancing will re-route connections from unhealthy instances, it still needs new instances to route connections to. Thus, auto scaling will initiate these new instances, and your load balancing will attach connections to them.

What is workload scaling? ›

Scaling workloads horizontally

It is implemented as a Kubernetes API resource and a controller and periodically adjusts the number of replicas in a workload to match observed resource utilization such as CPU or memory usage. There is a walkthrough tutorial of configuring a HorizontalPodAutoscaler for a Deployment.

What are the different workloads? ›

Some key types of workloads include:
  • Transactional workloads. ...
  • Batch workloads. ...
  • Analytical workloads. ...
  • Database workloads. ...
  • High-performance computing (HPC) workloads. ...
  • Test and dev workloads. ...
  • Real-time workloads. ...
  • Hybrid workloads.

What is the difference between auto scaling and autoscaling? ›

In conclusion, both services are powerful tools for scaling resources in the cloud, but they have key differences in their capabilities and configurations. EC2 Auto Scaling is more focused on scaling EC2 instances within an Auto Scaling group, while AWS AutoScaling can scale a wider range of resources and workloads.

What is meant by autoscaling? ›

Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm.

When to use autoscaling? ›

You should use AWS Auto Scaling if you have an application that uses one or more scalable resources and experiences variable load. A good example would be an e-commerce web application that receives variable traffic through the day.

What triggers autoscaling? ›

The Auto Scaling group in your Elastic Beanstalk environment uses two Amazon CloudWatch alarms to trigger scaling operations. The default triggers scale when the average outbound network traffic from each instance is higher than 6 MB or lower than 2 MB over a period of five minutes.

What is the disadvantage of autoscaling? ›

Performance Degradation: Autoscaling can cause performance degradation when new instances are started, as they consume time to initialize and stabilize. This can cause significant performance issues during peak demand, which can be detrimental to the end-user experience.

What is the primary goal of autoscaling? ›

AWS Auto Scaling continually monitors your applications to make sure that they are operating at your desired performance levels. When demand spikes, AWS Auto Scaling automatically increases the capacity of constrained resources so you maintain a high quality of service.

Can Auto Scaling work without load balancer? ›

Q: Can I use Amazon EC2 Auto Scaling for health checks and to replace unhealthy instances if I'm not using Elastic Load Balancing (ELB)? You don't have to use ELB to use Auto Scaling.

How do I know if my Auto Scaling is working? ›

Select the check box next to the Auto Scaling group. A split pane opens up in the bottom of the page. On the Activity tab, under Activity history, the Status column shows whether your Auto Scaling group has successfully launched or terminated instances, or whether the scaling activity is still in progress.

How do load balancing and Auto Scaling work together? ›

Here's how they work together: Load Balancer and Auto Scaling Groups: Auto scaling groups work in tandem with load balancers to automatically adjust the number of instances based on demand. Instances added or removed by auto scaling are registered or deregistered from the load balancer to maintain even distribution.

What does it mean to scale out? ›

What Is to Scale Out? To scale out is the process of selling off portions of total shares held while the price increases. To scale out, or scaling out, means to exit a position by selling in increments as the price of the stock climbs.

What is the concept of scale out? ›

App developers start to consider scaling out or horizontal scaling when they can't get enough resources for their workloads, even operating on the highest performance levels. With horizontal scaling, data is split into several databases, or shards, across servers, and each shard can be scaled up or down independently.

What is the difference between scaleout and scale up? ›

Scaling up vertically means adding more compute resources—such as CPU, memory, and disk capacity—to an application pod. On the other hand, applications can scale out horizontally by adding more replica pods.

What does scale out deployment mean? ›

Scale-out deployments are used to increase scalability of report servers to handle more concurrent users and larger report execution loads. It can also be used to dedicate specific servers to process interactive or scheduled reports.

Top Articles
How to Read A T-Bill Quote
Tuberculosis
Roblox Roguelike
Winston Salem Nc Craigslist
Wellcare Dual Align 129 (HMO D-SNP) - Hearing Aid Benefits | FreeHearingTest.org
Google Sites Classroom 6X
Konkurrenz für Kioske: 7-Eleven will Minisupermärkte in Deutschland etablieren
Tabler Oklahoma
Prices Way Too High Crossword Clue
Cape Cod | P Town beach
World Cup Soccer Wiki
Revitalising marine ecosystems: D-Shape’s innovative 3D-printed reef restoration solution - StartmeupHK
Voyeuragency
Local Dog Boarding Kennels Near Me
Eka Vore Portal
National Office Liquidators Llc
Grab this ice cream maker while it's discounted in Walmart's sale | Digital Trends
Puretalkusa.com/Amac
Alexander Funeral Home Gallatin Obituaries
Loves Employee Pay Stub
Officialmilarosee
Wbiw Weather Watchers
Xfinity Cup Race Today
Johnnie Walker Double Black Costco
Surplus property Definition: 397 Samples | Law Insider
Defending The Broken Isles
Best Town Hall 11
CohhCarnage - Twitch Streamer Profile & Bio - TopTwitchStreamers
My Reading Manga Gay
Little Einsteins Transcript
Shaman's Path Puzzle
Lil Durk's Brother DThang Killed in Harvey, Illinois, ME Confirms
Σινεμά - Τι Ταινίες Παίζουν οι Κινηματογράφοι Σήμερα - Πρόγραμμα 2024 | iathens.gr
Cruise Ships Archives
Sinai Sdn 2023
Case Funeral Home Obituaries
Tokyo Spa Memphis Reviews
Nba Props Covers
Craigslist Pets Plattsburgh Ny
Janaki Kalaganaledu Serial Today Episode Written Update
Pokemon Reborn Gyms
Sound Of Freedom Showtimes Near Lewisburg Cinema 8
Traumasoft Butler
Enr 2100
Gabrielle Abbate Obituary
The Average Amount of Calories in a Poke Bowl | Grubby's Poke
Dragon Ball Super Card Game Announces Next Set: Realm Of The Gods
Union Supply Direct Wisconsin
Nurses May Be Entitled to Overtime Despite Yearly Salary
Mail2World Sign Up
Diario Las Americas Rentas Hialeah
Unbiased Thrive Cat Food Review In 2024 - Cats.com
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 6361

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.