About instance autoscaling in Cloud Run services  |  Cloud Run Documentation  |  Google Cloud (2024)

In Cloud Run, each revisionis automatically scaled to the number of instances needed to handleall incoming requests, events, or CPU utilization.

When a revision does not receive any traffic, by default it is scaled in to zeroinstances. However, if desired, you can change this default tospecify an instance to be kept idle or "warm" using theminimum instances setting. If you areusing CPU outside of requests, you should set minimum instances equal to1.

In addition to the rate of incoming requests, events, or CPU utilization, thenumber of instances scheduled is impacted by:

  • The average CPU utilization of existing instances over a one minute window, targeting to keep scheduled instances toa 60% CPU utilization.
  • The current request concurrency, compared to the maximum concurrencyover a one minute window.
  • The maximum number of instances setting
  • The minimum number of instances setting

The Cloud Run autoscaler evaluates these every 5 seconds.

CPU always allocated and autoscaling

If you configure your Cloud Run service to haveCPU always allocated, you should beaware of scaling to and from zero behavior.

CPU always allocated scaling from zero. Scaling from zero can only be triggeredby a request, so a service that is not processing requests cannot scale fromzero. For these workloads, you can either set minimum instances > 0, or includea "wake-up request" in your design to restart processing after scaling to zero.

CPU always allocated scaling to zero. Given that no instance is ever at 0%CPU, looking at all CPU usage would result in never scaling to zero. This meansthe decision to scale from one to zero can only be made by checking to see ifthe instance is processing a request.

About maximum instances

In some cases you may want to limit the total number of instancesthat can be started, for cost control reasons, or for better compatibility withother resources used by your service. For example, your Cloud Runservice might interact with a database that can only handle a certain number ofconcurrent open connections.

You can use the maximum instances setting to limit the total number ofinstances that can be started in parallel, as documented inSetting a maximum number of instances.

Exceeding maximum instances

Under normal circ*mstances, your revision scales out by creating new instancesto handle incoming traffic load. But when you set a maximum instances limit, in somescenarios there will be insufficient instances to meet that traffic load. Inthat case, incoming requests are queued (pending) as follows:

  • If new instances are starting up, such as during a scale-out, requests willpend for at least the average startup time of container instances of this service.This includes when the request initiates a scale-out, such as when scalingfrom zero.
  • If the startup time is less than 10seconds, requests will pend for up to 10seconds.
  • If there are no instances in the process of starting, and the request does notinitiate a scale-out, requests will pend for up to10 seconds.

During this time window, if an instance finishes processing requests, it becomesavailable to process the queued pending requests.If no instances become available during the window, the request fails with a429 error code.

Scaling guarantees

The maximum instances limit is an upper limit per revision and it means that thenumber of instances for this revision shouldn't exceed the maximum.

Under normal circ*mstances, Cloud Run is able to scale out to the maximum instances limit very fast to handle all incoming requests or events. However,setting a high limit does not mean that your revision will be able scale out tothe specified number of instances at any given moment. In exceptional circ*mstances, Cloud Run can throttle scaling to ensure good servicefor all customers.

Exceeding maximum instances due to traffic spikes

In some cases, such as rapid traffic surges or system maintenance,Cloud Run might, for a short period of time, create moreinstances than are specified in the maximum instances setting. New instances can bestarted in excess of the maximum instances setting to replace existing instances and to providea grace period for inflight requests to finish processing.

The maximum instance limit can be exceeded under normal operation a few times perweek. The grace period usually lasts up to 15 minutes, or up tothe value specified in the request timeout setting.These extra instances are destroyed within 15 minutes after they become idle.

If many replacements are needed, the updates are usually spread out over many minutesor hours, but each replacement has an excess instance for just the grace period.Instances in excess of the maximum instance value are normally less than twice theconfigured maximum instances limit, but can be much larger for sudden large traffic spikes.

Load tests experience more instances exceeding the maximum instances setting becausethe system may change where traffic spikes are served to preserve capacity for existing workloadsthat have sustained load patterns.

If your service cannot tolerate this temporary behavior, you may wantto factor in a safety margin and set a lower maximum instances value.

Traffic splits

Because the maximum instances limit is a limit for each revision, if the servicesplits traffic across multiple revisions,the total number of instances for the service can exceed the maximum instancesper revision. This can be observed in the Instance Countmetrics.

Deployments

When you deploy a new revision to serve 100% of the traffic,Cloud Run starts enough instances of the new revision before directingtraffic to it. This reduces the impact of new revision deployments on requestlatencies, notably when serving high levels of traffic.Because the maximum instances limit is a limit for each revision, during adeployment, the total number of instances for the service can exceed the maximuminstances per revision. This can be observed in the Instance Countmetrics.

Idle instances and minimizing cold starts

Cloud Run does not immediately shut down instances once they havehandled all requests.To minimize the impact of cold starts, Cloud Run may keep some instancesidle for a maximum of 15 minutes.These instances are ready to handle requests in case of a sudden traffic spike.

For example, when an instance has finished handling requests, it mayremain idle for a period of time in case another request needs tobe handled. An idle instance may persist resources, such as opendatabase connections. Note that CPU is only allocated during request processingunless you explicitly configure your service to haveCPU always allocated.

To keep idle instances permanently available, use themin-instance setting. Note that usingthis feature will incur cost even when the service is notactively serving requests.

Autoscaling and pending requests

  • If new instances are starting up, such as during a scale-out, requests willpend for at least the average startup time of container instances of this service.This includes when the request initiates a scale-out, such as when scalingfrom zero.
  • If the startup time is less than 10seconds, requests will pend for up to 10seconds.
  • If there are no instances in the process of starting, and the request does notinitiate a scale-out, requests will pend for up to10 seconds.

Autoscaling impact on backing services

As the number of instances automatically increases, yourCloud Run service might encounter limits with its backing services.For example, Cloud SQL has an API quota limit.Make sure these backing services have enough quota and can handle connectionsfrom all instances of your Cloud Run service.Consider setting a maximum number of instancesto avoid overloading backing services.

Autoscaling and Pub/Sub

Google recommends using push subscriptions to consume messages from aPub/Sub topic on Cloud Run. Pushed messages are received likeHTTP requests by the container, thus triggering the same autoscaling behavior.

Autoscaling and multiple containers (sidecars)

Cloud Run considers the CPU utilization of instances for autoscaling, wherethe CPU utilization of an instance is the percentage of allocated CPU in use.

Note that you allocate CPU when you set CPU limits at the container level. If you use multiple containers per instance,the actual CPU allocation for that instance is the sum of the CPU limits you set on each container.

What's next

  • To manage the maximum number of instances of your Cloud Run services, seeSetting a maximum number of instances.
  • To manage the maximum number of simultaneous requests handled by each instance, seeSetting concurrency.
  • To optimize your concurrency setting, seedevelopment tips for tuning concurrency.
  • To specify an idle instance to keep running to minimize latency or cold startson first requests, seeUsing min-instance to enable idle instances.
About instance autoscaling in Cloud Run services  |  Cloud Run Documentation  |  Google Cloud (2024)
Top Articles
The 3 Stages of Process Validation Explained - SL Controls
What is a knot - Currents: NOAA's National Ocean Service Education
Pollen Count Centreville Va
Faridpur Govt. Girls' High School, Faridpur Test Examination—2023; English : Paper II
Gamevault Agent
Chatiw.ib
The Daily News Leader from Staunton, Virginia
Otterbrook Goldens
Tribune Seymour
Dark Souls 2 Soft Cap
Max 80 Orl
Iron Drop Cafe
Valentina Gonzalez Leaked Videos And Images - EroThots
Jcpenney At Home Associate Kiosk
DIN 41612 - FCI - PDF Catalogs | Technical Documentation
Craigslist Jobs Phoenix
6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com
065106619
Cinebarre Drink Menu
Sport-News heute – Schweiz & International | aktuell im Ticker
Mahpeople Com Login
PowerXL Smokeless Grill- Elektrische Grill - Rookloos & geurloos grillplezier - met... | bol
Uta Kinesiology Advising
Transactions (zipForm Edition) | Lone Wolf | Real Estate Forms Software
Pokemon Unbound Shiny Stone Location
Between Friends Comic Strip Today
Air Quality Index Endicott Ny
Aol News Weather Entertainment Local Lifestyle
South Bend Weather Underground
Surplus property Definition: 397 Samples | Law Insider
Manuela Qm Only
Ficoforum
Craigslist Pasco Kennewick Richland Washington
Skymovieshd.ib
Arlington Museum of Art to show shining, shimmering, splendid costumes from Disney Archives
Generator Supercenter Heartland
Kaliii - Area Codes Lyrics
Miles City Montana Craigslist
Ice Dodo Unblocked 76
Sony Wf-1000Xm4 Controls
Our Leadership
Barbie Showtimes Near Lucas Cinemas Albertville
Austin Automotive Buda
The best Verizon phones for 2024
2007 Jaguar XK Low Miles for sale - Palm Desert, CA - craigslist
The Wait Odotus 2021 Watch Online Free
Levi Ackerman Tattoo Ideas
Conan Exiles Tiger Cub Best Food
Large Pawn Shops Near Me
Lightfoot 247
Generator für Fantasie-Ortsnamen: Finden Sie den perfekten Namen
Latest Posts
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 5860

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.