System Design for Beginners

System Design Guide 2026 – Scalable Architecture Patterns

System Design Guide 2026

Master scalable architecture patterns: API Gateway, Rate Limiter, Load Balancer, Caching, CQRS, and distributed systems design for modern applications.

API Gateway Rate Limiter Load Balancer Caching CQRS Scalability Microservices

Modern System Architecture 2026

Building scalable systems requires understanding fundamental patterns and trade-offs. This guide covers essential components for designing systems that handle millions of requests.

Modern Scalable Architecture

Clients

Web/Mobile Apps

API Gateway

Single Entry Point

Load Balancer

Traffic Distribution

Microservices

Scalable Services

Key Principle: Design for horizontal scaling, anticipate failures, and implement observability from day one.

API Gateway Pattern

The API Gateway is a single entry point for all client requests, handling cross-cutting concerns like authentication, routing, and rate limiting.

Core Responsibilities Medium Complexity

Request Routing: Route requests to appropriate microservices
Authentication: Validate JWT tokens and API keys
Rate Limiting: Prevent API abuse
Response Aggregation: Combine multiple service responses
Monitoring: Log requests and track metrics

API Gateway Implementation

Node.js API Gateway with Express
const express = require('express');
const proxy = require('express-http-proxy');
const rateLimit = require('express-rate-limit');
const jwt = require('jsonwebtoken');

const app = express();
app.use(express.json());

// Authentication middleware
const authenticate = (req, res, next) => {
    const token = req.headers.authorization?.split(' ')[1];
    
    if (!token) {
        return res.status(401).json({ error: 'No token provided' });
    }
    
    try {
        const decoded = jwt.verify(token, process.env.JWT_SECRET);
        req.user = decoded;
        next();
    } catch (error) {
        return res.status(401).json({ error: 'Invalid token' });
    }
};

// Rate limiting
const limiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // Limit each IP to 100 requests per windowMs
    message: 'Too many requests, please try again later.'
});

// Apply rate limiting to all requests
app.use(limiter);

// Route to user service
app.use('/api/users', authenticate, proxy('http://user-service:3001'));

// Route to product service
app.use('/api/products', authenticate, proxy('http://product-service:3002'));

// Route to order service
app.use('/api/orders', authenticate, proxy('http://order-service:3003'));

// Health check endpoint
app.get('/health', (req, res) => {
    res.json({ status: 'healthy', timestamp: new Date().toISOString() });
});

// Start server
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`API Gateway running on port ${PORT}`);
});

Advantages

  • Centralized security management
  • Reduced client complexity
  • Better observability
  • Protocol translation

Disadvantages

  • Single point of failure
  • Performance bottleneck
  • Increased complexity
  • Operational overhead

Rate Limiter Pattern

Protect your APIs from abuse and ensure fair usage by implementing rate limiting strategies.

Rate Limiting Algorithms Low Complexity

AlgorithmDescriptionUse Case
Token BucketTokens added at fixed rate, each request consumes tokenAPI rate limiting
Leaky BucketRequests processed at constant rate, excess droppedNetwork traffic shaping
Fixed WindowCount requests in fixed time windowSimple rate limiting
Sliding WindowCount requests in sliding time windowAccurate rate limiting

Redis-based Rate Limiter

Distributed Rate Limiter with Redis
const Redis = require('ioredis');

class RateLimiter {
    constructor(redisConfig) {
        this.redis = new Redis(redisConfig);
    }
    
    /**
     * Sliding window rate limiter
     * @param {string} key - User or IP identifier
     * @param {number} limit - Maximum requests
     * @param {number} windowMs - Time window in milliseconds
     * @returns {Promise<{allowed: boolean, remaining: number}>}
     */
    async checkRateLimit(key, limit = 100, windowMs = 60000) {
        const now = Date.now();
        const windowStart = now - windowMs;
        
        // Use Redis sorted set for sliding window
        const pipeline = this.redis.pipeline();
        
        // Add current timestamp
        pipeline.zadd(key, now, now);
        
        // Remove old timestamps
        pipeline.zremrangebyscore(key, 0, windowStart);
        
        // Get count of requests in window
        pipeline.zcard(key);
        
        // Set expiration on the key
        pipeline.expire(key, Math.ceil(windowMs / 1000));
        
        const results = await pipeline.exec();
        const requestCount = results[2][1];
        
        return {
            allowed: requestCount <= limit,
            remaining: Math.max(0, limit - requestCount),
            reset: windowStart + windowMs
        };
    }
    
    /**
     * Token bucket rate limiter
     */
    async tokenBucketRateLimit(key, capacity = 10, refillRate = 1) {
        const now = Math.floor(Date.now() / 1000);
        const bucketKey = `token_bucket:${key}`;
        
        const luaScript = `
            local key = KEYS[1]
            local capacity = tonumber(ARGV[1])
            local refillRate = tonumber(ARGV[2])
            local now = tonumber(ARGV[3])
            
            local bucket = redis.call('hmget', key, 'tokens', 'lastRefill')
            local tokens = tonumber(bucket[1]) or capacity
            local lastRefill = tonumber(bucket[2]) or now
            
            -- Calculate refill
            local timePassed = now - lastRefill
            local refillAmount = timePassed * refillRate
            tokens = math.min(capacity, tokens + refillAmount)
            
            -- Check if request can be processed
            if tokens >= 1 then
                tokens = tokens - 1
                redis.call('hmset', key, 'tokens', tokens, 'lastRefill', now)
                redis.call('expire', key, 3600)
                return {1, tokens}  -- allowed, remaining tokens
            else
                return {0, tokens}  -- not allowed, remaining tokens
            end
        `;
        
        const result = await this.redis.eval(
            luaScript, 1, bucketKey, capacity, refillRate, now
        );
        
        return {
            allowed: result[0] === 1,
            remaining: result[1]
        };
    }
}

// Usage example
const limiter = new RateLimiter({ host: 'localhost', port: 6379 });

async function handleRequest(userId) {
    const result = await limiter.checkRateLimit(`user:${userId}`, 100, 60000);
    
    if (!result.allowed) {
        throw new Error(`Rate limit exceeded. Try again in ${Math.ceil((result.reset - Date.now()) / 1000)} seconds`);
    }
    
    // Process request...
    console.log(`Request allowed. ${result.remaining} requests remaining.`);
}

Load Balancer Patterns

Distribute traffic across multiple servers to ensure high availability and optimal resource utilization.

Load Balancing Algorithms Medium Complexity

Clients

Traffic Source

Load Balancer

Distributes Traffic

Server 1

Instance 1

Server 2

Instance 2

Server 3

Instance 3

AlgorithmDescriptionBest For
Round RobinRequests distributed sequentially to each serverEqual capacity servers
Least ConnectionsSend request to server with fewest active connectionsLong-lived connections
IP HashClient IP determines server (sticky sessions)Session persistence needed
Weighted Round RobinServers get requests proportional to their weight/capacityHeterogeneous server capacity
Least Response TimeForward to server with fastest response timePerformance optimization
Important: Always implement health checks to remove unhealthy servers from the pool automatically.

Caching Strategies & Patterns

Reduce latency and database load with strategic caching at multiple levels.

Multi-Level Caching Architecture High Complexity

Client-side Cache: Browser cache, CDN cache (fastest)
Reverse Proxy Cache: Nginx, Varnish (shared cache)
Application Cache: In-memory cache (Redis, Memcached)
Database Cache: Query cache, materialized views

Cache Patterns Implementation

Cache-Aside Pattern with Redis
const Redis = require('ioredis');

class CacheService {
    constructor() {
        this.redis = new Redis({
            host: process.env.REDIS_HOST,
            port: process.env.REDIS_PORT,
            retryStrategy: (times) => {
                // Exponential backoff for reconnection
                const delay = Math.min(times * 50, 2000);
                return delay;
            }
        });
    }
    
    /**
     * Cache-Aside pattern implementation
     */
    async getOrSet(key, fetchFn, ttlSeconds = 300) {
        // Try to get from cache
        let cachedData = await this.redis.get(key);
        
        if (cachedData) {
            return JSON.parse(cachedData);
        }
        
        // Cache miss - fetch from source
        const freshData = await fetchFn();
        
        // Store in cache with TTL
        await this.redis.setex(
            key,
            ttlSeconds,
            JSON.stringify(freshData)
        );
        
        return freshData;
    }
    
    /**
     * Write-Through cache pattern
     */
    async writeThrough(key, data, writeFn, ttlSeconds = 300) {
        // Write to database first
        await writeFn(data);
        
        // Then write to cache
        await this.redis.setex(
            key,
            ttlSeconds,
            JSON.stringify(data)
        );
        
        return data;
    }
    
    /**
     * Cache invalidation strategies
     */
    async invalidateCache(pattern) {
        // Delete specific key
        if (!pattern.includes('*')) {
            return await this.redis.del(pattern);
        }
        
        // Delete keys matching pattern
        const keys = await this.redis.keys(pattern);
        if (keys.length > 0) {
            return await this.redis.del(...keys);
        }
        
        return 0;
    }
    
    /**
     * Cache warming on application startup
     */
    async warmCache(keysToWarm, fetchFn) {
        const promises = keysToWarm.map(async (key) => {
            const data = await fetchFn(key);
            await this.redis.setex(key, 3600, JSON.stringify(data));
        });
        
        await Promise.all(promises);
        console.log(`Warmed cache for ${keysToWarm.length} keys`);
    }
}

// Usage example
const cache = new CacheService();

async function getUserProfile(userId) {
    return await cache.getOrSet(
        `user:${userId}:profile`,
        async () => {
            // This function is called only on cache miss
            console.log(`Cache miss for user ${userId}, fetching from DB`);
            return await UserModel.findById(userId);
        },
        600 // 10 minutes TTL
    );
}

In-Memory Cache

  • Redis: Rich data structures, persistence
  • Memcached: Simple, multi-threaded
  • Hazelcast: Distributed, fault-tolerant

Disk-Based Cache

  • Ehcache: Java, tiered storage
  • Couchbase: Document + cache
  • AWS ElastiCache: Managed Redis/Memcached

CQRS (Command Query Responsibility Segregation)

Separate read and write operations to optimize for different scalability requirements.

CQRS Architecture Overview High Complexity

Command Side

Write Operations

Event Bus

Publish Events

Read Side

Optimized for Reads

CQRS Implementation Example

CQRS with Event Sourcing
// Command side - Write model
class UserCommandService {
    constructor(eventStore, eventBus) {
        this.eventStore = eventStore;
        this.eventBus = eventBus;
    }
    
    async createUser(command) {
        const userId = uuidv4();
        
        // Create events
        const events = [
            {
                type: 'UserCreated',
                data: {
                    userId,
                    email: command.email,
                    name: command.name,
                    timestamp: new Date().toISOString()
                }
            }
        ];
        
        // Store events
        await this.eventStore.appendEvents(userId, events);
        
        // Publish events
        for (const event of events) {
            await this.eventBus.publish(event);
        }
        
        return userId;
    }
    
    async updateUserEmail(userId, newEmail) {
        const events = [
            {
                type: 'UserEmailUpdated',
                data: {
                    userId,
                    newEmail,
                    timestamp: new Date().toISOString()
                }
            }
        ];
        
        await this.eventStore.appendEvents(userId, events);
        
        for (const event of events) {
            await this.eventBus.publish(event);
        }
    }
}

// Read side - Optimized for queries
class UserReadService {
    constructor(readDatabase) {
        this.readDatabase = readDatabase;
    }
    
    async getUser(userId) {
        // Direct query to optimized read database
        return await this.readDatabase.users
            .findOne({ _id: userId })
            .lean(); // Fast read-only query
    }
    
    async searchUsers(query, page = 1, limit = 20) {
        // Complex queries on optimized schema
        const filter = {
            $or: [
                { name: { $regex: query, $options: 'i' } },
                { email: { $regex: query, $options: 'i' } }
            ]
        };
        
        return await this.readDatabase.users
            .find(filter)
            .skip((page - 1) * limit)
            .limit(limit)
            .lean();
    }
}

// Event handler to update read model
class UserProjection {
    constructor(readDatabase) {
        this.readDatabase = readDatabase;
    }
    
    async handleUserCreated(event) {
        await this.readDatabase.users.insertOne({
            _id: event.data.userId,
            email: event.data.email,
            name: event.data.name,
            createdAt: event.data.timestamp,
            updatedAt: event.data.timestamp
        });
    }
    
    async handleUserEmailUpdated(event) {
        await this.readDatabase.users.updateOne(
            { _id: event.data.userId },
            {
                $set: {
                    email: event.data.newEmail,
                    updatedAt: event.data.timestamp
                }
            }
        );
    }
}
When to use CQRS: Only when read and write workloads have significantly different requirements. Adds complexity for eventual consistency.

Building Scalable Systems

Principles and patterns for designing systems that can handle growth gracefully.

Scalability Dimensions High Complexity

Vertical Scaling: Add more resources to a single node (CPU, RAM)
Horizontal Scaling: Add more nodes to distribute load
Geographic Scaling: Distribute across regions for latency
Functional Scaling: Split by business capabilities (microservices)

Scalability Checklist

AreaRequirementsTools/Solutions
DatabaseSharding, replication, read replicas, connection poolingMongoDB sharding, PostgreSQL read replicas, Redis Cluster
CachingMulti-level caching, cache invalidation, CDNRedis Cluster, Varnish, CloudFront, Akamai
Message QueueAsync processing, event-driven architecture, backpressureKafka, RabbitMQ, AWS SQS, Google Pub/Sub
MonitoringReal-time metrics, distributed tracing, loggingPrometheus, Grafana, Jaeger, ELK Stack
DeploymentBlue-green, canary, rolling updates, auto-scalingKubernetes, Docker, AWS ECS, Terraform
Golden Rule: Design for failure. Assume everything will fail at some point and build resilience through redundancy, circuit breakers, and graceful degradation.

Real-World Architecture: E-commerce Platform

How to apply these patterns in a real e-commerce system handling 10,000 requests per second.

E-commerce Platform Architecture

Mobile App

Client

CDN

Static Assets

WAF

Security

API Gateway

Routing

Load Balancer

Traffic Dist

Cart Service

Redis Cache

Product Service

Read Replicas

Payment Service

Queue-based

Order Service

Event-Driven

Key Design Decisions

Product Catalog: Use read replicas for product queries, Redis cache for hot products
Shopping Cart: Redis for ephemeral cart data, session-based with TTL
Checkout Process: Saga pattern for distributed transactions, queue for payment processing
Inventory Management: Event sourcing for inventory changes, CQRS for reporting
Search: Elasticsearch for product search, updated via change data capture