It happens to every API eventually. You launch without rate limiting because “we’ll add it later.” Three weeks later, someone discovers your API and hits it 50,000 times in a minute. Maybe it’s a bot scraping your data. Maybe it’s a customer’s broken retry loop. Maybe it’s a competitor stress-testing your infrastructure. The result is the same: your servers are drowning, your database is on fire, and legitimate users can’t get a response.
Rate limiting isn’t a nice-to-have. It’s the bouncer at the door that keeps your API from being abused into oblivion.
The Algorithms — Pick the Right One
There are four common rate limiting algorithms. Each has tradeoffs.
Fixed Window Counter
The simplest approach. Divide time into fixed windows (e.g., 1-minute intervals). Count requests per window. Reject when the count exceeds the limit.
Window 12:00-12:01: 98/100 requests used
Window 12:01-12:02: 0/100 requests used
The problem: a burst at the window boundary. If a client sends 100 requests at 12:00:59 and another 100 at 12:01:01, they’ve sent 200 requests in 2 seconds while technically staying within the per-minute limit. This can overload your server even though the rate limit is “working.”
Sliding Window Log
Stores the timestamp of every request. When a new request arrives, count timestamps within the last window duration and reject if over the limit. Most accurate, but stores every timestamp — memory grows linearly with traffic.
Sliding Window Counter
A hybrid. Uses fixed windows but weights the previous window’s count based on overlap. If 70% of the current window has elapsed, count 30% of the previous window’s requests plus 100% of the current window’s. Good accuracy without per-request storage.
Token Bucket
A bucket holds tokens, refilled at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, allowing controlled bursts up to that capacity while enforcing the average rate over time.
For most APIs, token bucket or sliding window counter is the right choice. Token bucket if you want to allow controlled bursts (most APIs should). Sliding window counter if you want strict per-window enforcement.
Implementing Token Bucket with Redis
Redis is the standard backing store for rate limiting because it’s fast, atomic, and shared across application servers.
The token bucket needs two pieces of data per client: the number of remaining tokens and the last refill timestamp.
Here’s the logic in a Redis Lua script (atomic execution, no race conditions):
-- Token bucket rate limiter (Lua script for Redis)
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1]) -- bucket capacity
local refill_rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3]) -- current timestamp
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or max_tokens
local last_refill = tonumber(data[2]) or now
-- Calculate tokens to add since last refill
local elapsed = now - last_refill
local new_tokens = elapsed * refill_rate
tokens = math.min(max_tokens, tokens + new_tokens)
local allowed = 0
if tokens >= 1 then
tokens = tokens - 1
allowed = 1
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) * 2)
return { allowed, math.floor(tokens) }
This runs atomically in Redis. No race conditions even with 50 application servers hitting the same key simultaneously. The EXPIRE ensures cleanup of inactive clients.
Call it from your application:
const result = await redis.eval(luaScript, 1,
`ratelimit:${clientId}`,
100, // max 100 tokens (burst capacity)
1.67, // refill at 1.67/sec = 100/min
Date.now() / 1000
);
const [allowed, remaining] = result;
Setting Rate Limit Headers
Good APIs tell clients their rate limit status on every response. This lets well-behaved clients self-regulate instead of blindly hitting the limit.
Standard headers (following the IETF draft):
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1712345678
When the limit is exceeded, return 429 Too Many Requests with a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1712345678
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Retry after 30 seconds.",
"retry_after": 30
}
Include the Retry-After value in both the header and body. Some HTTP clients check headers, some parse the body.
Edge Cases That Break Naive Implementations
Client identification.
Most tutorials rate limit by IP address. This breaks immediately in production because:
- Corporate NATs: thousands of users share one IP. Rate limiting by IP blocks an entire company.
- VPNs and proxies: same problem.
- Mobile carriers: CGNAT means millions of users share IP pools.
Rate limit by API key for authenticated endpoints. For unauthenticated endpoints (login, registration), rate limit by a combination of IP + other signals (user agent, fingerprint).
For login endpoints specifically, rate limit by both IP and target username. This prevents credential stuffing attacks where an attacker tries common passwords against thousands of usernames from a single IP.
Per-endpoint limits.
Not all endpoints are equal. Your /search endpoint might hit the database hard and need aggressive limiting. Your /health endpoint should probably never be limited. Your /login endpoint needs very tight limits to prevent brute force.
Apply different limits per endpoint or endpoint group:
/api/search → 30 requests/minute
/api/users → 100 requests/minute
/api/login → 5 requests/minute
/api/health → no limit
Distributed systems.
If you have multiple application servers behind a load balancer, each server needs to check the same rate limit counter. This is why Redis (or any shared store) is necessary. In-memory rate limiting on each server means a client can send N × server_count requests per window by hitting each server once.
Time synchronization.
If your application servers have slightly different clocks, timestamp-based algorithms can behave inconsistently. Use the Redis server’s time (redis.call('TIME')) instead of the application server’s clock.
Graceful Degradation
If Redis goes down, what happens to your rate limiter?
Bad answer: all requests are blocked because the rate limiter can’t check counts.
Good answer: fall back to a local in-memory rate limiter with generous limits, or allow requests through with logging and alerting.
async function checkRateLimit(clientId) {
try {
return await redisRateLimiter.check(clientId);
} catch (error) {
logger.warn('Redis rate limiter unavailable, falling back');
return localRateLimiter.check(clientId);
}
}
Rate limiting should protect your API, not become a single point of failure. Design it to fail open (allow requests) rather than fail closed (block everything), with monitoring to alert you when the fallback activates.
The Rate Limiting Checklist
- Pick the algorithm — token bucket for most APIs
- Use Redis — shared state across servers, atomic operations
- Identify clients properly — API key, not just IP
- Set per-endpoint limits — tight for auth, generous for reads
- Return proper headers — X-RateLimit-Limit, Remaining, Reset
- Return 429 with Retry-After — don’t just drop connections
- Handle Redis failure — fall back, don’t fail closed
- Monitor — track 429 rates, identify abusive patterns
- Document your limits — developers need to know them upfront
Rate limiting is one of those things that’s boring until you need it. And when you need it, you need it ten minutes ago. Build it before the bot finds your API. It’s a lot less stressful that way.
If you found this guide helpful, check out our other resources: