It happens to every API eventually. You launch without rate limiting because “we’ll add it later.” Three weeks later, someone discovers your API and hits it 50,000 times in a minute. Maybe it’s a bot scraping your data. Maybe it’s a customer’s broken retry loop. Maybe it’s a competitor stress-testing your infrastructure. The result is the same: your servers are drowning, your database is on fire, and legitimate users can’t get a response.

Rate limiting isn’t a nice-to-have. It’s the bouncer at the door that keeps your API from being abused into oblivion.

The Algorithms — Pick the Right One

There are four common rate limiting algorithms. Each has tradeoffs.

Fixed Window Counter

The simplest approach. Divide time into fixed windows (e.g., 1-minute intervals). Count requests per window. Reject when the count exceeds the limit.

Window 12:00-12:01: 98/100 requests used

Window 12:01-12:02: 0/100 requests used

The problem: a burst at the window boundary. If a client sends 100 requests at 12:00:59 and another 100 at 12:01:01, they’ve sent 200 requests in 2 seconds while technically staying within the per-minute limit. This can overload your server even though the rate limit is “working.”

Sliding Window Log

Stores the timestamp of every request. When a new request arrives, count timestamps within the last window duration and reject if over the limit. Most accurate, but stores every timestamp — memory grows linearly with traffic.

Sliding Window Counter

A hybrid. Uses fixed windows but weights the previous window’s count based on overlap. If 70% of the current window has elapsed, count 30% of the previous window’s requests plus 100% of the current window’s. Good accuracy without per-request storage.

Token Bucket

A bucket holds tokens, refilled at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, allowing controlled bursts up to that capacity while enforcing the average rate over time.

For most APIs, token bucket or sliding window counter is the right choice. Token bucket if you want to allow controlled bursts (most APIs should). Sliding window counter if you want strict per-window enforcement.

Implementing Token Bucket with Redis

Redis is the standard backing store for rate limiting because it’s fast, atomic, and shared across application servers.

The token bucket needs two pieces of data per client: the number of remaining tokens and the last refill timestamp.

Here’s the logic in a Redis Lua script (atomic execution, no race conditions):

-- Token bucket rate limiter (Lua script for Redis)

local key = KEYS[1]

local max_tokens = tonumber(ARGV[1])      -- bucket capacity

local refill_rate = tonumber(ARGV[2])      -- tokens per second

local now = tonumber(ARGV[3])              -- current timestamp

local data = redis.call('HMGET', key, 'tokens', 'last_refill')

local tokens = tonumber(data[1]) or max_tokens

local last_refill = tonumber(data[2]) or now

-- Calculate tokens to add since last refill

local elapsed = now - last_refill

local new_tokens = elapsed * refill_rate

tokens = math.min(max_tokens, tokens + new_tokens)

local allowed = 0

if tokens >= 1 then

    tokens = tokens - 1

    allowed = 1

end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)

redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) * 2)

return { allowed, math.floor(tokens) }

This runs atomically in Redis. No race conditions even with 50 application servers hitting the same key simultaneously. The EXPIRE ensures cleanup of inactive clients.

Call it from your application:

const result = await redis.eval(luaScript, 1,

    `ratelimit:${clientId}`,

    100,           // max 100 tokens (burst capacity)

    1.67,          // refill at 1.67/sec = 100/min

    Date.now() / 1000

);

const [allowed, remaining] = result;

Setting Rate Limit Headers

Good APIs tell clients their rate limit status on every response. This lets well-behaved clients self-regulate instead of blindly hitting the limit.

Standard headers (following the IETF draft):

X-RateLimit-Limit: 100

X-RateLimit-Remaining: 67

X-RateLimit-Reset: 1712345678

When the limit is exceeded, return 429 Too Many Requests with a Retry-After header:

HTTP/1.1 429 Too Many Requests

Retry-After: 30

X-RateLimit-Limit: 100

X-RateLimit-Remaining: 0

X-RateLimit-Reset: 1712345678

{

    "error": "rate_limit_exceeded",

    "message": "Too many requests. Retry after 30 seconds.",

    "retry_after": 30

}

Include the Retry-After value in both the header and body. Some HTTP clients check headers, some parse the body.

Edge Cases That Break Naive Implementations

Client identification.

Most tutorials rate limit by IP address. This breaks immediately in production because:

Corporate NATs: thousands of users share one IP. Rate limiting by IP blocks an entire company.
VPNs and proxies: same problem.
Mobile carriers: CGNAT means millions of users share IP pools.

Rate limit by API key for authenticated endpoints. For unauthenticated endpoints (login, registration), rate limit by a combination of IP + other signals (user agent, fingerprint).

For login endpoints specifically, rate limit by both IP and target username. This prevents credential stuffing attacks where an attacker tries common passwords against thousands of usernames from a single IP.

Per-endpoint limits.

Not all endpoints are equal. Your /search endpoint might hit the database hard and need aggressive limiting. Your /health endpoint should probably never be limited. Your /login endpoint needs very tight limits to prevent brute force.

Apply different limits per endpoint or endpoint group:

/api/search     → 30 requests/minute

/api/users      → 100 requests/minute

/api/login      → 5 requests/minute

/api/health     → no limit

Distributed systems.

If you have multiple application servers behind a load balancer, each server needs to check the same rate limit counter. This is why Redis (or any shared store) is necessary. In-memory rate limiting on each server means a client can send N × server_count requests per window by hitting each server once.

Time synchronization.

If your application servers have slightly different clocks, timestamp-based algorithms can behave inconsistently. Use the Redis server’s time (redis.call('TIME')) instead of the application server’s clock.

Graceful Degradation

If Redis goes down, what happens to your rate limiter?

Bad answer: all requests are blocked because the rate limiter can’t check counts.

Good answer: fall back to a local in-memory rate limiter with generous limits, or allow requests through with logging and alerting.

async function checkRateLimit(clientId) {

    try {

        return await redisRateLimiter.check(clientId);

    } catch (error) {

        logger.warn('Redis rate limiter unavailable, falling back');

        return localRateLimiter.check(clientId);

    }

}

Rate limiting should protect your API, not become a single point of failure. Design it to fail open (allow requests) rather than fail closed (block everything), with monitoring to alert you when the fallback activates.

The Rate Limiting Checklist

Pick the algorithm — token bucket for most APIs
Use Redis — shared state across servers, atomic operations
Identify clients properly — API key, not just IP
Set per-endpoint limits — tight for auth, generous for reads
Return proper headers — X-RateLimit-Limit, Remaining, Reset
Return 429 with Retry-After — don’t just drop connections
Handle Redis failure — fall back, don’t fail closed
Monitor — track 429 rates, identify abusive patterns
Document your limits — developers need to know them upfront

Rate limiting is one of those things that’s boring until you need it. And when you need it, you need it ten minutes ago. Build it before the bot finds your API. It’s a lot less stressful that way.

If you found this guide helpful, check out our other resources:

How to Implement JWT Authentication Properly

How to Implement API Rate Limiting — Token Bucket, Sliding Window, and the Edge Cases Nobody Warns You About

The Algorithms — Pick the Right One

Implementing Token Bucket with Redis

Setting Rate Limit Headers

Edge Cases That Break Naive Implementations

Graceful Degradation

The Rate Limiting Checklist

Step-by-Step Guide

Choose between rate limiting algorithms

Implement with Redis for distributed systems

Set proper rate limit headers

Handle edge cases

Add graceful degradation

Frequently Asked Questions

You May Also Like

How to Implement JWT Authentication Properly — Access Tokens, Refresh Tokens, and Common Mistakes

Topics