System Design Guide 2026
Master scalable architecture patterns: API Gateway, Rate Limiter, Load Balancer, Caching, CQRS, and distributed systems design for modern applications.
Modern System Architecture 2026
Building scalable systems requires understanding fundamental patterns and trade-offs. This guide covers essential components for designing systems that handle millions of requests.
Clients
Web/Mobile Apps
API Gateway
Single Entry Point
Load Balancer
Traffic Distribution
Microservices
Scalable Services
API Gateway Pattern
The API Gateway is a single entry point for all client requests, handling cross-cutting concerns like authentication, routing, and rate limiting.
Core Responsibilities Medium Complexity
API Gateway Implementation
const express = require('express');
const proxy = require('express-http-proxy');
const rateLimit = require('express-rate-limit');
const jwt = require('jsonwebtoken');
const app = express();
app.use(express.json());
// Authentication middleware
const authenticate = (req, res, next) => {
const token = req.headers.authorization?.split(' ')[1];
if (!token) {
return res.status(401).json({ error: 'No token provided' });
}
try {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
req.user = decoded;
next();
} catch (error) {
return res.status(401).json({ error: 'Invalid token' });
}
};
// Rate limiting
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Limit each IP to 100 requests per windowMs
message: 'Too many requests, please try again later.'
});
// Apply rate limiting to all requests
app.use(limiter);
// Route to user service
app.use('/api/users', authenticate, proxy('http://user-service:3001'));
// Route to product service
app.use('/api/products', authenticate, proxy('http://product-service:3002'));
// Route to order service
app.use('/api/orders', authenticate, proxy('http://order-service:3003'));
// Health check endpoint
app.get('/health', (req, res) => {
res.json({ status: 'healthy', timestamp: new Date().toISOString() });
});
// Start server
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`API Gateway running on port ${PORT}`);
});Advantages
- Centralized security management
- Reduced client complexity
- Better observability
- Protocol translation
Disadvantages
- Single point of failure
- Performance bottleneck
- Increased complexity
- Operational overhead
Rate Limiter Pattern
Protect your APIs from abuse and ensure fair usage by implementing rate limiting strategies.
Rate Limiting Algorithms Low Complexity
| Algorithm | Description | Use Case |
|---|---|---|
| Token Bucket | Tokens added at fixed rate, each request consumes token | API rate limiting |
| Leaky Bucket | Requests processed at constant rate, excess dropped | Network traffic shaping |
| Fixed Window | Count requests in fixed time window | Simple rate limiting |
| Sliding Window | Count requests in sliding time window | Accurate rate limiting |
Redis-based Rate Limiter
const Redis = require('ioredis');
class RateLimiter {
constructor(redisConfig) {
this.redis = new Redis(redisConfig);
}
/**
* Sliding window rate limiter
* @param {string} key - User or IP identifier
* @param {number} limit - Maximum requests
* @param {number} windowMs - Time window in milliseconds
* @returns {Promise<{allowed: boolean, remaining: number}>}
*/
async checkRateLimit(key, limit = 100, windowMs = 60000) {
const now = Date.now();
const windowStart = now - windowMs;
// Use Redis sorted set for sliding window
const pipeline = this.redis.pipeline();
// Add current timestamp
pipeline.zadd(key, now, now);
// Remove old timestamps
pipeline.zremrangebyscore(key, 0, windowStart);
// Get count of requests in window
pipeline.zcard(key);
// Set expiration on the key
pipeline.expire(key, Math.ceil(windowMs / 1000));
const results = await pipeline.exec();
const requestCount = results[2][1];
return {
allowed: requestCount <= limit,
remaining: Math.max(0, limit - requestCount),
reset: windowStart + windowMs
};
}
/**
* Token bucket rate limiter
*/
async tokenBucketRateLimit(key, capacity = 10, refillRate = 1) {
const now = Math.floor(Date.now() / 1000);
const bucketKey = `token_bucket:${key}`;
const luaScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refillRate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('hmget', key, 'tokens', 'lastRefill')
local tokens = tonumber(bucket[1]) or capacity
local lastRefill = tonumber(bucket[2]) or now
-- Calculate refill
local timePassed = now - lastRefill
local refillAmount = timePassed * refillRate
tokens = math.min(capacity, tokens + refillAmount)
-- Check if request can be processed
if tokens >= 1 then
tokens = tokens - 1
redis.call('hmset', key, 'tokens', tokens, 'lastRefill', now)
redis.call('expire', key, 3600)
return {1, tokens} -- allowed, remaining tokens
else
return {0, tokens} -- not allowed, remaining tokens
end
`;
const result = await this.redis.eval(
luaScript, 1, bucketKey, capacity, refillRate, now
);
return {
allowed: result[0] === 1,
remaining: result[1]
};
}
}
// Usage example
const limiter = new RateLimiter({ host: 'localhost', port: 6379 });
async function handleRequest(userId) {
const result = await limiter.checkRateLimit(`user:${userId}`, 100, 60000);
if (!result.allowed) {
throw new Error(`Rate limit exceeded. Try again in ${Math.ceil((result.reset - Date.now()) / 1000)} seconds`);
}
// Process request...
console.log(`Request allowed. ${result.remaining} requests remaining.`);
}Load Balancer Patterns
Distribute traffic across multiple servers to ensure high availability and optimal resource utilization.
Load Balancing Algorithms Medium Complexity
Clients
Traffic Source
Load Balancer
Distributes Traffic
Server 1
Instance 1
Server 2
Instance 2
Server 3
Instance 3
| Algorithm | Description | Best For |
|---|---|---|
| Round Robin | Requests distributed sequentially to each server | Equal capacity servers |
| Least Connections | Send request to server with fewest active connections | Long-lived connections |
| IP Hash | Client IP determines server (sticky sessions) | Session persistence needed |
| Weighted Round Robin | Servers get requests proportional to their weight/capacity | Heterogeneous server capacity |
| Least Response Time | Forward to server with fastest response time | Performance optimization |
Caching Strategies & Patterns
Reduce latency and database load with strategic caching at multiple levels.
Multi-Level Caching Architecture High Complexity
Cache Patterns Implementation
const Redis = require('ioredis');
class CacheService {
constructor() {
this.redis = new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
retryStrategy: (times) => {
// Exponential backoff for reconnection
const delay = Math.min(times * 50, 2000);
return delay;
}
});
}
/**
* Cache-Aside pattern implementation
*/
async getOrSet(key, fetchFn, ttlSeconds = 300) {
// Try to get from cache
let cachedData = await this.redis.get(key);
if (cachedData) {
return JSON.parse(cachedData);
}
// Cache miss - fetch from source
const freshData = await fetchFn();
// Store in cache with TTL
await this.redis.setex(
key,
ttlSeconds,
JSON.stringify(freshData)
);
return freshData;
}
/**
* Write-Through cache pattern
*/
async writeThrough(key, data, writeFn, ttlSeconds = 300) {
// Write to database first
await writeFn(data);
// Then write to cache
await this.redis.setex(
key,
ttlSeconds,
JSON.stringify(data)
);
return data;
}
/**
* Cache invalidation strategies
*/
async invalidateCache(pattern) {
// Delete specific key
if (!pattern.includes('*')) {
return await this.redis.del(pattern);
}
// Delete keys matching pattern
const keys = await this.redis.keys(pattern);
if (keys.length > 0) {
return await this.redis.del(...keys);
}
return 0;
}
/**
* Cache warming on application startup
*/
async warmCache(keysToWarm, fetchFn) {
const promises = keysToWarm.map(async (key) => {
const data = await fetchFn(key);
await this.redis.setex(key, 3600, JSON.stringify(data));
});
await Promise.all(promises);
console.log(`Warmed cache for ${keysToWarm.length} keys`);
}
}
// Usage example
const cache = new CacheService();
async function getUserProfile(userId) {
return await cache.getOrSet(
`user:${userId}:profile`,
async () => {
// This function is called only on cache miss
console.log(`Cache miss for user ${userId}, fetching from DB`);
return await UserModel.findById(userId);
},
600 // 10 minutes TTL
);
}In-Memory Cache
- Redis: Rich data structures, persistence
- Memcached: Simple, multi-threaded
- Hazelcast: Distributed, fault-tolerant
Disk-Based Cache
- Ehcache: Java, tiered storage
- Couchbase: Document + cache
- AWS ElastiCache: Managed Redis/Memcached
CQRS (Command Query Responsibility Segregation)
Separate read and write operations to optimize for different scalability requirements.
CQRS Architecture Overview High Complexity
Command Side
Write Operations
Event Bus
Publish Events
Read Side
Optimized for Reads
CQRS Implementation Example
// Command side - Write model
class UserCommandService {
constructor(eventStore, eventBus) {
this.eventStore = eventStore;
this.eventBus = eventBus;
}
async createUser(command) {
const userId = uuidv4();
// Create events
const events = [
{
type: 'UserCreated',
data: {
userId,
email: command.email,
name: command.name,
timestamp: new Date().toISOString()
}
}
];
// Store events
await this.eventStore.appendEvents(userId, events);
// Publish events
for (const event of events) {
await this.eventBus.publish(event);
}
return userId;
}
async updateUserEmail(userId, newEmail) {
const events = [
{
type: 'UserEmailUpdated',
data: {
userId,
newEmail,
timestamp: new Date().toISOString()
}
}
];
await this.eventStore.appendEvents(userId, events);
for (const event of events) {
await this.eventBus.publish(event);
}
}
}
// Read side - Optimized for queries
class UserReadService {
constructor(readDatabase) {
this.readDatabase = readDatabase;
}
async getUser(userId) {
// Direct query to optimized read database
return await this.readDatabase.users
.findOne({ _id: userId })
.lean(); // Fast read-only query
}
async searchUsers(query, page = 1, limit = 20) {
// Complex queries on optimized schema
const filter = {
$or: [
{ name: { $regex: query, $options: 'i' } },
{ email: { $regex: query, $options: 'i' } }
]
};
return await this.readDatabase.users
.find(filter)
.skip((page - 1) * limit)
.limit(limit)
.lean();
}
}
// Event handler to update read model
class UserProjection {
constructor(readDatabase) {
this.readDatabase = readDatabase;
}
async handleUserCreated(event) {
await this.readDatabase.users.insertOne({
_id: event.data.userId,
email: event.data.email,
name: event.data.name,
createdAt: event.data.timestamp,
updatedAt: event.data.timestamp
});
}
async handleUserEmailUpdated(event) {
await this.readDatabase.users.updateOne(
{ _id: event.data.userId },
{
$set: {
email: event.data.newEmail,
updatedAt: event.data.timestamp
}
}
);
}
}Building Scalable Systems
Principles and patterns for designing systems that can handle growth gracefully.
Scalability Dimensions High Complexity
Scalability Checklist
| Area | Requirements | Tools/Solutions |
|---|---|---|
| Database | Sharding, replication, read replicas, connection pooling | MongoDB sharding, PostgreSQL read replicas, Redis Cluster |
| Caching | Multi-level caching, cache invalidation, CDN | Redis Cluster, Varnish, CloudFront, Akamai |
| Message Queue | Async processing, event-driven architecture, backpressure | Kafka, RabbitMQ, AWS SQS, Google Pub/Sub |
| Monitoring | Real-time metrics, distributed tracing, logging | Prometheus, Grafana, Jaeger, ELK Stack |
| Deployment | Blue-green, canary, rolling updates, auto-scaling | Kubernetes, Docker, AWS ECS, Terraform |
Real-World Architecture: E-commerce Platform
How to apply these patterns in a real e-commerce system handling 10,000 requests per second.
Mobile App
Client
CDN
Static Assets
WAF
Security
API Gateway
Routing
Load Balancer
Traffic Dist
Cart Service
Redis Cache
Product Service
Read Replicas
Payment Service
Queue-based
Order Service
Event-Driven
