Every Next.js app that exposes API routes or server actions to the internet faces the same harsh reality: without rate limiting, a single bad actor can overwhelm your server, drain your AI credits, or brute-force your login forms in minutes. Rate limiting controls how many requests a client can make within a given time window, and honestly, it's one of the most effective defenses you can add to any production app.
In this guide, you'll learn how to implement rate limiting across every surface of a Next.js App Router application — middleware, route handlers, server actions, and AI-powered endpoints — using both zero-dependency approaches and the production-grade Upstash Redis library.
Why Rate Limiting Matters for Next.js Apps
Next.js applications deployed on serverless platforms like Vercel are billed per function invocation and bandwidth. Without rate limiting, you're basically leaving the front door wide open. Here's what can go wrong:
- Runaway costs — A bot hitting your OpenAI-backed endpoint 10,000 times can cost hundreds of dollars in minutes
- Service degradation — Excessive requests slow down response times for your actual users
- Brute-force attacks — Login endpoints and password reset forms are prime targets
- Spam and abuse — Contact forms and comment systems get flooded without protection
- Resource exhaustion — Database connection pools can be depleted by unchecked traffic
Rate limiting isn't optional for production apps. It's a baseline security measure that sits right alongside authentication, input validation, and CSRF protection.
Understanding Rate Limiting Algorithms
Before writing any code, it helps to understand the three algorithms you'll encounter in every rate limiting library. Each one makes a different tradeoff between strictness, memory usage, and burst tolerance.
Fixed Window
This is the simplest algorithm. It divides time into fixed intervals (say, one-minute windows) and counts requests within each window. When the count exceeds the limit, subsequent requests get rejected until the next window starts.
The downside? Boundary spikes. A user could make 10 requests at the end of one window and 10 more at the start of the next, effectively doubling the intended limit within a short period.
Sliding Window
An improvement over fixed window. It calculates a weighted average between the current and previous windows, producing a smoother rate limit without that boundary spike problem. This is the algorithm most commonly recommended for web applications — and it's what we'll use throughout this guide.
Token Bucket
This one allows controlled bursts. Think of a "bucket" that fills with tokens at a fixed rate. Each request consumes one token. When the bucket is empty, requests are rejected. The bucket has a maximum capacity, so tokens only accumulate up to that limit. It's ideal for APIs where occasional traffic spikes are perfectly acceptable.
Setting Up Upstash Redis for Production
For any application beyond a hobby project, you need a distributed rate limiting store. In-memory counters reset on every deployment and don't work across multiple serverless function instances. Upstash Redis is the go-to choice for Next.js because it uses HTTP-based connections (no persistent TCP connections required), works in both Node.js and Edge runtimes, and is free for up to 10,000 requests per day.
Install the Dependencies
npm install @upstash/ratelimit @upstash/redis
Create an Upstash Redis Database
Sign up at upstash.com, create a new Redis database, and copy the REST URL and token into your .env.local file:
# .env.local
UPSTASH_REDIS_REST_URL=https://your-database.upstash.io
UPSTASH_REDIS_REST_TOKEN=your_token_here
Create a Shared Rate Limiter Module
Here's a pattern I really like — define all your rate limiters in a single file so they can be reused across middleware, route handlers, and server actions:
// lib/rate-limit.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();
// General API rate limiter: 60 requests per minute
export const apiLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(60, "1 m"),
analytics: true,
prefix: "ratelimit:api",
});
// Strict limiter for auth endpoints: 5 attempts per minute
export const authLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(5, "1 m"),
analytics: true,
prefix: "ratelimit:auth",
});
// AI endpoint limiter: 10 requests per minute
export const aiLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "1 m"),
analytics: true,
prefix: "ratelimit:ai",
});
// Contact form limiter: 3 submissions per hour
export const formLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(3, "1 h"),
analytics: true,
prefix: "ratelimit:form",
});
The prefix option namespaces the Redis keys so each limiter tracks its counters independently. The analytics flag enables the Upstash dashboard to display rate limiting metrics, which is super handy for monitoring.
Rate Limiting in Next.js Middleware
Middleware is the most powerful place to apply rate limiting because it intercepts requests before they reach your route handlers or server actions. Abusive traffic gets blocked at the edge, which means you're not paying for serverless function invocations on requests that should've been rejected anyway.
// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { apiLimiter, authLimiter } from "@/lib/rate-limit";
export async function middleware(request: NextRequest) {
const ip = request.headers.get("x-forwarded-for")?.split(",")[0]
?? request.ip
?? "127.0.0.1";
// Use stricter limits for auth endpoints
const isAuthRoute = request.nextUrl.pathname.startsWith("/api/auth");
const limiter = isAuthRoute ? authLimiter : apiLimiter;
const identifier = isAuthRoute ? `auth:${ip}` : `api:${ip}`;
const { success, limit, remaining, reset } = await limiter.limit(identifier);
if (!success) {
return NextResponse.json(
{ error: "Too many requests. Please try again later." },
{
status: 429,
headers: {
"X-RateLimit-Limit": limit.toString(),
"X-RateLimit-Remaining": "0",
"Retry-After": Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
);
}
const response = NextResponse.next();
response.headers.set("X-RateLimit-Limit", limit.toString());
response.headers.set("X-RateLimit-Remaining", remaining.toString());
return response;
}
export const config = {
matcher: ["/api/:path*"],
};
This middleware applies a sliding window of 60 requests per minute to all API routes and a stricter 5 requests per minute to authentication routes. The X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers follow the IETF draft standard and help clients implement proper backoff behavior.
Rate Limiting API Route Handlers
Sometimes you need per-route rate limiting that's more granular than what middleware provides. For example, an expensive search endpoint might need its own limit independent of other API routes.
// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const searchLimiter = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(20, "1 m"),
prefix: "ratelimit:search",
});
export async function GET(request: NextRequest) {
const ip = request.headers.get("x-forwarded-for")?.split(",")[0] ?? "127.0.0.1";
const { success, remaining, reset } = await searchLimiter.limit(ip);
if (!success) {
return NextResponse.json(
{ error: "Search rate limit exceeded" },
{
status: 429,
headers: {
"Retry-After": Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
);
}
const query = request.nextUrl.searchParams.get("q") ?? "";
// ... perform search logic
return NextResponse.json({ results: [] });
}
If you also have middleware-level rate limiting, the route handler limit acts as a second layer of defense. A request has to pass both the global middleware limit and the route-specific limit to get through.
Rate Limiting Server Actions
Server actions are one of the most overlooked attack surfaces in Next.js applications. This is something a lot of developers miss. Under the hood, every server action is a POST endpoint that can be called directly with a fetch request — no UI required. If your server action sends emails, writes to a database, or calls an external API, it must be rate limited.
The challenge with server actions is that they don't expose a Request object. You need to use the headers() function from next/headers to extract the client IP address:
// app/actions/contact.ts
"use server";
import { headers } from "next/headers";
import { formLimiter } from "@/lib/rate-limit";
export async function submitContactForm(formData: FormData) {
const headerList = await headers();
const ip = headerList.get("x-forwarded-for")?.split(",")[0] ?? "127.0.0.1";
const { success } = await formLimiter.limit(`contact:${ip}`);
if (!success) {
return {
error: "You have submitted too many messages. Please try again later.",
};
}
const name = formData.get("name") as string;
const email = formData.get("email") as string;
const message = formData.get("message") as string;
// Validate inputs...
// Send email or save to database...
return { success: true };
}
Rate Limiting Authenticated Server Actions
When the user is authenticated, you should use their user ID instead of the IP address. This prevents issues with shared networks (like offices or universities) where many users share the same IP:
// app/actions/ai-chat.ts
"use server";
import { headers } from "next/headers";
import { auth } from "@/lib/auth";
import { aiLimiter } from "@/lib/rate-limit";
export async function sendChatMessage(message: string) {
const session = await auth();
// Use user ID for authenticated users, IP for anonymous
let identifier: string;
if (session?.user?.id) {
identifier = `ai:user:${session.user.id}`;
} else {
const headerList = await headers();
const ip = headerList.get("x-forwarded-for")?.split(",")[0] ?? "127.0.0.1";
identifier = `ai:anon:${ip}`;
}
const { success, remaining } = await aiLimiter.limit(identifier);
if (!success) {
return { error: "Rate limit reached. Please wait before sending another message." };
}
// Call OpenAI or other AI provider...
return { response: "AI response here", remaining };
}
Protecting AI Endpoints from Cost Overruns
AI endpoints deserve special attention because each request can cost real money. A single GPT-4 call can cost anywhere from $0.03 to $0.12 depending on token count. An unprotected endpoint receiving 1,000 malicious requests? That's $30–120 gone in minutes. I've seen this happen to a colleague's side project, and it's not fun.
Beyond basic rate limiting, you'll want to implement these additional layers for AI endpoints:
Tiered Rate Limits by User Plan
// lib/ai-rate-limit.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();
export function getAiLimiter(plan: "free" | "pro" | "enterprise") {
const limits = {
free: { requests: 5, window: "1 h" as const },
pro: { requests: 100, window: "1 h" as const },
enterprise: { requests: 1000, window: "1 h" as const },
};
const { requests, window } = limits[plan];
return new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(requests, window),
prefix: `ratelimit:ai:${plan}`,
});
}
Token Budget Tracking
Rate limiting by request count alone isn't enough for AI endpoints. A single request with a massive prompt can consume thousands of tokens. You really want to track token usage alongside request counts:
// app/api/ai/chat/route.ts
import { NextRequest, NextResponse } from "next/server";
import { Redis } from "@upstash/redis";
import { getAiLimiter } from "@/lib/ai-rate-limit";
import { auth } from "@/lib/auth";
const redis = Redis.fromEnv();
const DAILY_TOKEN_BUDGET = 50_000; // tokens per day
export async function POST(request: NextRequest) {
const session = await auth();
if (!session?.user) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
// Check request rate limit
const limiter = getAiLimiter(session.user.plan ?? "free");
const { success } = await limiter.limit(session.user.id);
if (!success) {
return NextResponse.json({ error: "Rate limit exceeded" }, { status: 429 });
}
// Check daily token budget
const tokenKey = `tokens:${session.user.id}:${new Date().toISOString().split("T")[0]}`;
const usedTokens = (await redis.get<number>(tokenKey)) ?? 0;
if (usedTokens >= DAILY_TOKEN_BUDGET) {
return NextResponse.json(
{ error: "Daily token budget exceeded" },
{ status: 429 }
);
}
// Call AI provider and track usage
const { text, tokensUsed } = await callAiProvider(request);
await redis.incrby(tokenKey, tokensUsed);
await redis.expire(tokenKey, 86400); // expire after 24 hours
return NextResponse.json({ text, tokensRemaining: DAILY_TOKEN_BUDGET - usedTokens - tokensUsed });
}
In-Memory Rate Limiting Without External Dependencies
Not every project needs Redis right away. If you're building a prototype or running a single-instance Node.js server (not serverless), you can implement rate limiting without any external packages. This approach uses a simple Map to track request counts:
// lib/memory-rate-limit.ts
type RateLimitEntry = {
count: number;
resetTime: number;
};
const store = new Map<string, RateLimitEntry>();
export function rateLimit(
identifier: string,
maxRequests: number,
windowMs: number
): { success: boolean; remaining: number } {
const now = Date.now();
const entry = store.get(identifier);
// Clean up expired entries periodically
if (store.size > 10000) {
for (const [key, value] of store) {
if (value.resetTime < now) store.delete(key);
}
}
if (!entry || entry.resetTime < now) {
store.set(identifier, { count: 1, resetTime: now + windowMs });
return { success: true, remaining: maxRequests - 1 };
}
if (entry.count >= maxRequests) {
return { success: false, remaining: 0 };
}
entry.count++;
return { success: true, remaining: maxRequests - entry.count };
}
And here's how you'd use it in a route handler:
// app/api/hello/route.ts
import { NextRequest, NextResponse } from "next/server";
import { rateLimit } from "@/lib/memory-rate-limit";
export async function GET(request: NextRequest) {
const ip = request.headers.get("x-forwarded-for")?.split(",")[0] ?? "127.0.0.1";
const { success, remaining } = rateLimit(ip, 30, 60_000); // 30 req/min
if (!success) {
return NextResponse.json({ error: "Too many requests" }, { status: 429 });
}
return NextResponse.json({ message: "Hello", remaining });
}
Important caveat: In-memory rate limiting doesn't work in serverless environments because each function invocation gets its own memory space. It also won't persist across server restarts. Use this only for development, prototyping, or single-server deployments.
Optimizing for DDoS Scenarios with Ephemeral Cache
During a DDoS attack, even the Redis calls from your rate limiter can become a bottleneck. Upstash provides an ephemeral cache feature that stores rate limit data in memory as long as the serverless function instance stays warm. Repeated requests from the same attacker skip the Redis roundtrip entirely:
// lib/rate-limit-optimized.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
// Declare cache outside the handler so it persists across warm invocations
const cache = new Map();
export const optimizedLimiter = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(60, "1 m"),
ephemeralCache: cache,
analytics: true,
prefix: "ratelimit:optimized",
});
The ephemeralCache option dramatically reduces Redis calls during traffic spikes. When the function is "hot" (processing multiple requests within the same instance), rate limit decisions happen from the local cache with zero network overhead. It's a small addition that can make a big difference under pressure.
Handling Rate Limit Responses on the Client
A good rate limiting implementation isn't complete without proper client-side handling. Nobody wants to see a cryptic error message. When calling server actions from your React components, handle the rate limit error gracefully:
// app/components/ContactForm.tsx
"use client";
import { useState } from "react";
import { submitContactForm } from "@/app/actions/contact";
export function ContactForm() {
const [status, setStatus] = useState<"idle" | "loading" | "success" | "rate-limited" | "error">("idle");
async function handleSubmit(formData: FormData) {
setStatus("loading");
const result = await submitContactForm(formData);
if (result.error) {
setStatus(result.error.includes("too many") ? "rate-limited" : "error");
return;
}
setStatus("success");
}
return (
<form action={handleSubmit}>
{/* form fields */}
<button type="submit" disabled={status === "loading" || status === "rate-limited"}>
{status === "loading" ? "Sending..." : "Send Message"}
</button>
{status === "rate-limited" && (
<p className="text-red-600">
You have sent too many messages. Please wait a few minutes before trying again.
</p>
)}
</form>
);
}
For API route handlers, clients should read the Retry-After header and implement exponential backoff before retrying.
Testing Your Rate Limiter
You can test rate limiting locally using a quick script that fires rapid requests:
# Test API route rate limiting with curl
for i in $(seq 1 15); do
echo "Request $i:"
curl -s -o /dev/null -w "HTTP %{http_code}\n" http://localhost:3000/api/search?q=test
done
For automated testing with Vitest, mock the rate limiter to avoid hitting Redis in your test environment:
// __tests__/rate-limit.test.ts
import { describe, it, expect, vi } from "vitest";
vi.mock("@/lib/rate-limit", () => ({
apiLimiter: {
limit: vi.fn()
.mockResolvedValueOnce({ success: true, limit: 60, remaining: 59, reset: Date.now() + 60000 })
.mockResolvedValueOnce({ success: false, limit: 60, remaining: 0, reset: Date.now() + 60000 }),
},
}));
describe("Rate Limited API Route", () => {
it("returns 200 when under the limit", async () => {
const response = await fetch("http://localhost:3000/api/search?q=test");
expect(response.status).toBe(200);
});
it("returns 429 when over the limit", async () => {
const response = await fetch("http://localhost:3000/api/search?q=test");
expect(response.status).toBe(429);
});
});
Choosing the Right Strategy
So, which approach should you pick? Here's a quick decision matrix:
- Global protection for all API routes — Use middleware with Upstash. It blocks traffic before serverless functions are invoked, saving you money.
- Per-route limits for expensive operations — Add route-handler-level limiters on top of middleware for endpoints that need tighter controls.
- Server action protection — Rate limit inside the action using
headers()to get the client IP. Essential for forms, mutations, and AI calls. - AI cost control — Combine request rate limiting with token budget tracking. Use tiered limits based on user plan.
- Development and prototyping — In-memory rate limiting works fine. Just switch to Redis before deploying to production.
For most production applications, the setup I'd recommend is middleware-level rate limiting with Upstash Redis for global protection, plus per-action rate limiting for sensitive server actions and AI endpoints. It gives you coverage at every layer without being overly complicated.
Frequently Asked Questions
How do I rate limit Next.js server actions?
Server actions don't expose a Request object, so you can't read the IP directly. Instead, use the headers() function from next/headers to access the x-forwarded-for header and extract the client IP. Pass that IP to your rate limiter (such as @upstash/ratelimit) at the beginning of the action, and return an error if the limit is exceeded.
Does rate limiting work with Next.js Edge Runtime?
Yes, it does. The @upstash/ratelimit library uses HTTP-based connections instead of persistent TCP connections, making it fully compatible with both the Node.js runtime and the Edge runtime. You can use it in middleware (which runs on the edge by default) and in route handlers configured with export const runtime = "edge".
What is the difference between middleware rate limiting and route handler rate limiting?
Middleware rate limiting runs before any route handler is invoked, meaning blocked requests never trigger a serverless function invocation — that saves costs on platforms like Vercel. Route handler rate limiting gives you more granular control, letting you apply different limits to different endpoints. For the best protection, use both: middleware for global limits and route handlers for endpoint-specific limits.
Can I rate limit by user ID instead of IP address?
Absolutely, and it's often the better choice. IP-based limiting can unfairly affect users behind shared networks (offices, universities, VPNs). When users are authenticated, use their database user ID as the rate limit identifier. For unauthenticated endpoints like login or registration, IP-based limiting is really your only reliable option.
How do I handle rate limiting in development without Redis?
You've got two solid options. First, you can use the in-memory rate limiter described in this guide, which stores counters in a JavaScript Map. Second, you can spin up a local Redis instance using Docker (docker run -p 6379:6379 redis) and point your Upstash client at it. Honestly, most developers just skip rate limiting in development entirely and only enable it in production using an environment variable check.