API Management 101: Rate Limiting (2024)

API rate limiting is one of the fundamental aspects of managing traffic to your APIs. It is important for quality of service, efficiency and security. It is also one of the easiest and most efficient ways to control traffic to your APIs.

What is API rate limiting and how does it work?

An API rate limit refers to the number of calls the client (API consumer) can make in a second. Rate limits are calculated in requests per second (RPS).

Let’s say you only want a client to call an API a maximum of 10 times per minute. You can apply a rate limit to expressed as “10 requests per 60 seconds”. The client will be able to call the API successfully up to 10 times within any 60-second interval. If they call the API any more within that timeframe, they’ll get an error stating they have exceeded their rate limit.

Benefits of rate limiting

API rate limiting can:

What are the different types of rate limiting?

There are different ways that you can approach API rate limiting.

Key-level rate limiting is focused on controlling API traffic from individual sources and making sure that users are staying within their prescribed limits. You could limit the rate of calls the user of a key can make to all available APIs (i.e. a global limit) or to specific, individual APIs (a key-level-per-API limit).

API-level rate limiting assesses all traffic coming into an API from all sources and ensures that the overall rate limit is not exceeded. This limit could be calculated by something as simple as having a good idea of the maximum number of requests you could expect from users of your API. It could also be something more scientific and precise, such as the number of requests your system can handle while still performing at a high level. You can quickly establish this threshold with performance testing.

Which type of API rate limiting should you use?

These two approaches have different use cases. They can also be used in unison to power an overall API rate limiting strategy.

The simplest way to figure out which type of rate limits you should apply can be determined by asking a few questions:

Do you want to protect against denial of service attacks or overwhelming amounts of traffic fromall usersof the API? Then, go for anAPI-level global rate limit!
Do you want to limit the number of API requests a specific user can make toall APIsthey have access to? Then choose akey-level global rate limit!
Do you want to limit the number of requests a specific user can make to specific APIs they have access to? Then it’s time for akey-level per-API rate limit.

How to implement rate limiting in API environments

If you want to implement API rate limiting, you have various strategies available to you, including several algorithm-based approaches. These include:

How to test API rate limiting

It’s important to test that your API rate limit is working as it should. It’s not the kind of thing you want untested when you’re facing a DoS attack! There are companies that will undertake API pen testing to test how robust your API security is, including how well your rate limiting works.

You’ll also need to check your API rate limits are still appropriate as your business grows. An API management tool with a handy dashboard should make it easy for you to see which limits you have in place.

How long does the rate limit last?

There is no fixed answer to how long an API rate limit lasts. It is common to apply a dynamic rate limit based on the number of requests per second, but you could also think in terms of minutes, hours or whatever timeframe best suits your business model.

What is API throttling vs rate limiting?

There are two ways that requests can be handled once they exceed the prescribed limit. One is by returning an error (via API rate limiting); the other is by queueing the request (though throttling) to be executed later.

You can implement throttling at key or policy level, depending on your requirements. It’s a versatile approach that can work well if you prefer not to throw an error back when a rate limit is exceeded. By using throttling, you can instead queue the request to auto-retry.

Throttling means that you can protect your API while still enabling people to use it. However, it can slow down the service that the user receives considerably, so how to throttle API requests needs careful thought in terms of maintaining service quality and availability.

What does “API rate limit exceeded” mean?

“API rate limit exceeded” means precisely what it says – that the client trying to call an API has exceeded its rate limit. This will result in the service producing a 429 error status response. You can modify that response to include relevant details about why the response has been triggered.

How to bypass an API rate limit

While API rate limiting can go a long way towards protecting the availability of your APIs and downstream services, it is not without its flaws. Some individuals have worked out how to bypass an API rate limit. In fact, they’ve worked out several ways to do so.

If you use an IP-based rate-limiter, rather than key-level rate limiting, people could bypass your limits using proxy servers. They can multiply their usual quota by the number of proxies they can use.

Key-based API rate limiting can also be bypassed, by people creating multiple accounts and getting numerous keys.

There are other techniques out there, such as using client-side JavaScript to bypass rate limits, so be aware that knowing how to rate limit API products doesn’t make them impervious to being bypassed!

We mentioned pen testing above. While you’re thinking about API functionality, performance and testing, why not check out this article on API testing tools?

API Management 101: Rate Limiting (2024)

FAQs

How do you manage rate limits in API? ›

Different Methods of Rate Limiting

Throttling. Throttling is performed by setting up a temporary state within the API, so the API can properly assess all requests. ...
Request Queues. Another popular method of rate limiting is “requests queues”, which limits the number of requests in any given period of time. ...
Algorithm-Based.

Read On ›

What is rate-limit by key API Management? ›

The rate-limit-by-key policy prevents API usage spikes on a per key basis by limiting the call rate to a specified number per a specified time period. The key can have an arbitrary string value and is typically provided using a policy expression.

Discover More Details ›

What is rate limiting in API connect? ›

In API Connect, rate limits can be defined as unlimited, or with a specified number of calls per second, minute, hour, day, or week. Rate limits can be "hard" (enforced) or "soft". If the rate limit is hard and a call exceeds the limit, then the call is aborted and an error is returned.

What is rate limiting in API status code? ›

The HTTP 429 Too Many Requests client error response status code indicates the client has sent too many requests in a given amount of time. This mechanism of asking the client to slow down the rate of requests is commonly called "rate limiting".

See Details ›

What is the difference between API rate limiting and throttling? ›

While they share the common goal of managing API traffic, their approaches and purposes differ significantly. Rate limiting acts as the equitable gatekeeper, ensuring all users play by the same rules, while throttling is the adaptive traffic controller, maintaining the flow regardless of conditions.

Find Out More ›

How do you avoid hitting rate limits in API integration? ›

Reducing the number of API requests

Optimize your code to eliminate any unnecessary API calls. ...
Cache frequently used data. ...
Use bulk and batch endpoints, such as Update Many Tickets, that let you update up to 100 tickets with a single API request.

Tell Me More ›

Which best describes API rate limiting? ›

API rate limiting is a set of measures put in place to help ensure the stability and performance of an API system. It works by setting limits on how many requests can be made within a certain period of time — usually a few seconds or minutes — and what actions can be taken.

Show Me More ›

Where to implement rate limiting? ›

Rate limiting can be implemented at the network level, by setting limits on the rate of traffic or on the number of requests made to specific resources, or at the application level, by setting limits on the number of requests made by individual users or clients.

Explore More ›

Should I rate limit my API? ›

API rate limiting is one of the fundamental aspects of managing traffic to your APIs. It is important for quality of service, efficiency and security. It is also one of the easiest and most efficient ways to control traffic to your APIs.

What to do when API rate limit exceeded? ›

Exceeding the rate limit

If you exceed your primary rate limit, you will receive a 403 or 429 response, and the x-ratelimit-remaining header will be 0 . You should not retry your request until after the time specified by the x-ratelimit-reset header.

Show Me More ›

What is API rate limiter problem? ›

Rate limiting for APIs helps protect against malicious bot attacks as well. An attacker can use bots to make so many repeated calls to an API that it renders the service unavailable for anyone else, or crashes the service altogether. This is a type of DoS or DDoS attack.

Read The Full Story ›

What is rate limiting vulnerability in API? ›

API rate limiting is the practice of limiting the number of requests a user or client can make to an API within a given time frame. This helps to prevent abuse, misuse or overloading of the API infrastructure.

See Details ›

What is the rate limit per user? ›

A rate limiter specifies the limit for an API request per second or minute and optionally specifies the user identification rules to determine to which API request this limit is applied.

Get More Info Here ›

What are the limits of API usage? ›

General API Limits

Pro: 2 million requests per year.
Corporate: 6 million requests per year.
Enterprise: 200 million requests per year.

How to implement rate limiting in Java API? ›

Rate limiting controls the number of requests a user can make to an API within a specific time window. There are several algorithms to implement rate limiting, each with its own use cases and benefits. Today, we'll focus on four: fixed window counter, sliding window log, leaky bucket, and token bucket.

How do I increase the rate limit for Google API? ›

120 API calls per minute. To request an increase to these quotas: In the Google Cloud console, go to the IAM & admin > Quotas page. Select the API Keys API quota that you want to increase: Read requests per minute and/or Write requests per minute.

View Details ›