When talking about API integration, the matter of API rate limiting is unavoidable. Researching what the rate limits of the APIs you’re going to ‘consume’ in your integrations are should be part of your overall preparation for the development of your project scope. It is also a crucial detail to know when you will be designing your integrations; rate limits will define early on what mechanism you need to implement to avoid hitting them in the first place. Hence, let’s talk in a bit more detail about API rate limiting, its techniques as well as workarounds and strategies to prevent hitting rate limits.
What is API rate limiting?
Let’s kick off with a short definition. Rate limiting is a mechanism to limit an API usage for a specific period of time. In other words, it is a way of limiting how many API calls can be made by its consumers at a given moment. In essence, rate limiting helps keep API traffic in check.
Why does API rate limiting matter?
If you’re a software developer, you might not need any explanation on how scaling works in application development. But for completeness’ sake, let’s go briefly over this concept.
It is generally best practice to divide an application into front-end, back-end and database. At the very beginning, while you’re just starting out, you might pack it all on one server. But when the load and resource consumption increase, you would probably want to move the database onto another server in order to ensure stable performance of your application. You can even move a file storage on a separate server in order to reduce the load on the main server and increase the speed.
As your application becomes even more popular and widely-used, you’ll need to scale your back-end and your database even further, which involves even more servers, for example, for the separation of calculation processes in the database or an even distribution of queries to the back-end. In short – you’ll end up with a significant number of servers.
And so, you would want to limit the rate of calls to your APIs for a few very simple reasons:
- Resources cost money. Too many API calls means more resource consumption, means you might get charged extra by your hosting / server provider. Or imagine your application makes use of yet another API that charges you per certain amount of calls. For these reasons alone it is wise to implement API rate limiting in order to keep your own costs in check
- You wouldn’t want a single user of your application (e.g. a large enterprise with many end-users) to overwhelm your systems and resources to the point that your other customers will suffer from long responses or time-outs
- As an application / APIs provider who is a reliable business partner, you want to guarantee certain SLAs. In order to meet them, you have to be able to control the traffic rate coming to your APIs from each single customer or user to ensure that it is kept within agreed levels
- Last but not least, limiting the rate of API calls can offer more protection from malicious attacks such a repeated login attempts by trial and error, ultimately increase the security of your application
Here are a few examples of how API rate limiting is handled by application providers:
Braze, a vendor of a comprehensive user engagement product, for example, limits rate by request type and has a pretty detailed table for that:
Intercom offers a more modest overview of its API rate limiting policy:
Box, on the other hand, also rate-limit by request type, but with fewer options than Braze in general:
SIDE NOTE: By the way, if you are, by any chance, an APIs developer and interested in some best practices on how to communicate about rate limiting to your API consumers, Twitter API is quoted by an overwhelming number of sources as the poster child for this type of communication. They provide a detailed description of rate limits for each of their APIs in their documentation (the link leads to the overview of all Twitter APIs, and you can click on each one to dive deeper into the topic). Not only that – they also provide a dedicated API endpoint for the rate limit status.
A brief overview of rate limiting techniques
We are not going to dive too deep into the specifics of various techniques for API rate limiting, but it’s worth listing at least the most popular ones, even if only for the sake of comprehensiveness.
- Token Bucket
- Leaky Bucket
- Fixed Window
- Sliding Logs
- Sliding Window
Imagine you have a physical bucket with a certain amount of tokens in it. For each request coming from the consumer, one token will be “taken out” of the bucket. Once there are no tokens left, any request will be denied until the bucket receives a fresh batch of tokens, which happens every specified time unit – for example, each minute.
This algorithm is based on queues. The size of a queue is predetermined – for example, 5 requests per queue. The queue will accept the requests until its limit of 5 is reached. The requests that are then situated in the queue will be processed at a regular frequency, but as long as the queue is still full, each next request that would have otherwise led to its “overflowing” (i.e. “leaking”) will be rejected.
With this algorithm, there is a fixed amount of requests allowed to be processed within a fixed time unit – say, 50 requests per minute. Each time a request is submitted, the counter for the current minute is updated and once it has reached the number of 50, each next request will be rejected until the minute is over and the next minute starts. This algorithm is probably the simplest to implement but also the riskiest to have as it doesn’t protect your system from a spike of requests, when all 50 requests are sent in the first second of a minute.
With the Sliding Log technique, each request from a specific consumer receives a time-stamped log, and essentially, the number of accepted requests per time unit is tracked with the help of these logs. If timestamps are older than the window time, they are simply deleted if they are not relevant any more. If the number of requests exceeds a certain rate limit, they are moved into a queue. This technique is considered to be nearly flawless – and yet it is very expensive as it requires a lot of memory to store the logs and leads to a high resource consumption because of the continuous calculations of the sum of logs.
The Sliding Window technique is considered to be a combination of the Fixed Window and Sliding Logs algorithms. There is a counter for each specified time window – say, 5 seconds –, and every time a new request comes in, the algorithm will check the number of requests made in the last 5 seconds. Should the last 5 seconds counted back from each single second have always contained 5 requests but the 5 seconds counted back from this precise second suddenly contain 6 requests, the two latest requests will not be processed.
The image below demonstrates the difference of the Sliding Window technique to the Fixed Window one.
How rate limiting can be an issue in API integration projects?
Well, to put it simply, it can break your integrations. Ok, this might be a bit overdramatic, but essentially, it can lead to potentially lost data and malfunctioning integration flows, especially if there are no recovery / retry mechanisms in place. Zendesk, for example, would deactivate your webhook if there is a several-minutes outage from your side, meaning that the integration flow will just stop working.
This can become even a bigger issue if you don’t have any integration monitoring, logging and alerting system which could flag a failed integration to you, so that you can go and retry the flow manually, for example, or remedy the failed request in any other way.
Workarounds to recover from hitting API rate limits or to prevent this from happening in the first place
First and foremost, as a consumer of an API, you obviously must be aware of its rate limits to design your integration tasks in such a way that you avoid hitting them.
For example, for our REST API integration component that can be used to integrate with any REST API for which there is no dedicated connector yet, we implemented the Delay feature, which enables you to delay the execution of the next request. For example, if a flow type is set to Ordinary (the other type would be Real Time) and is scheduled to run every 1 minute, but the delay is set to 120 seconds, the next flow execution will run only after 120 seconds instead of 1 minute.
In our case, we pair the Delay feature with Calls Count feature. The time for delay is calculated as Delay / Calls Count, where the default value of Calls Count is 1. It is possible, though, to use another value.
Another workaround to prevent hitting the API limit is to keep the connection alive. Some APIs are designed in such a way that they will count each connection as an actual request. If you have e.g. a rate limit of 10 calls per second, you can use one of these to establish a connection, keep it open for the rest of the minute, and then use the remaining 9 calls to GET or POST data you need. Admittedly, though, this workaround is much less graceful than delaying subsequent executions, and it’s generally considered to be rather a bad practice to keep a connection alive as this might lead to an unnecessary server load.
Ok, so what do you do when you have already hit rate limits? In this case, there are a few options too. The recommendable option is, once your integration did hit an API’s rate limit and received the
429 Too Many Requests response, to have an implemented mechanism to retry the request. It’s generally common to build-in three to five retry attempts, where each next attempt occurs after gradually increasing so-called ‘sleeping’ duration. This allows the server sort of to catch up and either to reset the counter, process a queued request freeing up space, or give you a fresh batch of tokens – depending on the implemented API rate limiting technique. On our platform we actually have 10 retries, where each single retry has an exponentially increasing delay, too.
There is another workaround as well, although this one is relatively tricky to implement and just like with ‘keeping the connection alive’ trick, it is less graceful than the one described above – namely, you can increase the size of the page to be returned upon a request. Say, you set the size to 100 data objects and you noticed that your integration hits API rate limits time and again. You can go ahead and increase the page size up to 200 data objects per request and see if this solves the issue or at least reduces the frequency of alerts.
Of course, the key prerequisite here is that the API provider supports pagination. Without it, it will be impossible to control the size of the page. Also, you need to be really careful about increasing the page size and find just the right balance because if the size of incoming data is too large, your system might not be able to process it at all or won’t have enough time to download all data in the first place before the connection times out. And certainly, the effectiveness or validity of this workaround depends on the character of your integration.