System Design Guide 2026 – Scalable Architecture Patterns

TKTips.org | System Design

System Design Guide 2026

Master scalable architecture patterns: API Gateway, Rate Limiter, Load Balancer, Caching, CQRS, and distributed systems design for modern applications.

API Gateway Rate Limiter Load Balancer Caching CQRS Scalability Microservices

Modern System Architecture 2026

Building scalable systems requires understanding fundamental patterns and trade-offs. This guide covers essential components for designing systems that handle millions of requests.

Modern Scalable Architecture

Clients

Web/Mobile Apps

API Gateway

Single Entry Point

Load Balancer

Traffic Distribution

Microservices

Scalable Services

Key Principle: Design for horizontal scaling, anticipate failures, and implement observability from day one.

API Gateway Pattern

The API Gateway is a single entry point for all client requests, handling cross-cutting concerns like authentication, routing, and rate limiting.

Core Responsibilities Medium Complexity

Request Routing: Route requests to appropriate microservices

Authentication: Validate JWT tokens and API keys

Rate Limiting: Prevent API abuse

Response Aggregation: Combine multiple service responses

Monitoring: Log requests and track metrics

API Gateway Implementation

Node.js API Gateway with Express

const express = require('express');
const proxy = require('express-http-proxy');
const rateLimit = require('express-rate-limit');
const jwt = require('jsonwebtoken');

const app = express();
app.use(express.json());

// Authentication middleware
const authenticate = (req, res, next) => {
    const token = req.headers.authorization?.split(' ')[1];
    
    if (!token) {
        return res.status(401).json({ error: 'No token provided' });
    }
    
    try {
        const decoded = jwt.verify(token, process.env.JWT_SECRET);
        req.user = decoded;
        next();
    } catch (error) {
        return res.status(401).json({ error: 'Invalid token' });
    }
};

// Rate limiting
const limiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // Limit each IP to 100 requests per windowMs
    message: 'Too many requests, please try again later.'
});

// Apply rate limiting to all requests
app.use(limiter);

// Route to user service
app.use('/api/users', authenticate, proxy('http://user-service:3001'));

// Route to product service
app.use('/api/products', authenticate, proxy('http://product-service:3002'));

// Route to order service
app.use('/api/orders', authenticate, proxy('http://order-service:3003'));

// Health check endpoint
app.get('/health', (req, res) => {
    res.json({ status: 'healthy', timestamp: new Date().toISOString() });
});

// Start server
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`API Gateway running on port ${PORT}`);
});

Advantages

Centralized security management
Reduced client complexity
Better observability
Protocol translation

Disadvantages

Single point of failure
Performance bottleneck
Increased complexity
Operational overhead

Rate Limiter Pattern

Protect your APIs from abuse and ensure fair usage by implementing rate limiting strategies.

Rate Limiting Algorithms Low Complexity

Algorithm	Description	Use Case
Token Bucket	Tokens added at fixed rate, each request consumes token	API rate limiting
Leaky Bucket	Requests processed at constant rate, excess dropped	Network traffic shaping
Fixed Window	Count requests in fixed time window	Simple rate limiting
Sliding Window	Count requests in sliding time window	Accurate rate limiting

Redis-based Rate Limiter

Distributed Rate Limiter with Redis

const Redis = require('ioredis');

class RateLimiter {
    constructor(redisConfig) {
        this.redis = new Redis(redisConfig);
    }
    
    /**
     * Sliding window rate limiter
     * @param {string} key - User or IP identifier
     * @param {number} limit - Maximum requests
     * @param {number} windowMs - Time window in milliseconds
     * @returns {Promise<{allowed: boolean, remaining: number}>}
     */
    async checkRateLimit(key, limit = 100, windowMs = 60000) {
        const now = Date.now();
        const windowStart = now - windowMs;
        
        // Use Redis sorted set for sliding window
        const pipeline = this.redis.pipeline();
        
        // Add current timestamp
        pipeline.zadd(key, now, now);
        
        // Remove old timestamps
        pipeline.zremrangebyscore(key, 0, windowStart);
        
        // Get count of requests in window
        pipeline.zcard(key);
        
        // Set expiration on the key
        pipeline.expire(key, Math.ceil(windowMs / 1000));
        
        const results = await pipeline.exec();
        const requestCount = results[2][1];
        
        return {
            allowed: requestCount <= limit,
            remaining: Math.max(0, limit - requestCount),
            reset: windowStart + windowMs
        };
    }
    
    /**
     * Token bucket rate limiter
     */
    async tokenBucketRateLimit(key, capacity = 10, refillRate = 1) {
        const now = Math.floor(Date.now() / 1000);
        const bucketKey = `token_bucket:${key}`;
        
        const luaScript = `
            local key = KEYS[1]
            local capacity = tonumber(ARGV[1])
            local refillRate = tonumber(ARGV[2])
            local now = tonumber(ARGV[3])
            
            local bucket = redis.call('hmget', key, 'tokens', 'lastRefill')
            local tokens = tonumber(bucket[1]) or capacity
            local lastRefill = tonumber(bucket[2]) or now
            
            -- Calculate refill
            local timePassed = now - lastRefill
            local refillAmount = timePassed * refillRate
            tokens = math.min(capacity, tokens + refillAmount)
            
            -- Check if request can be processed
            if tokens >= 1 then
                tokens = tokens - 1
                redis.call('hmset', key, 'tokens', tokens, 'lastRefill', now)
                redis.call('expire', key, 3600)
                return {1, tokens}  -- allowed, remaining tokens
            else
                return {0, tokens}  -- not allowed, remaining tokens
            end
        `;
        
        const result = await this.redis.eval(
            luaScript, 1, bucketKey, capacity, refillRate, now
        );
        
        return {
            allowed: result[0] === 1,
            remaining: result[1]
        };
    }
}

// Usage example
const limiter = new RateLimiter({ host: 'localhost', port: 6379 });

async function handleRequest(userId) {
    const result = await limiter.checkRateLimit(`user:${userId}`, 100, 60000);
    
    if (!result.allowed) {
        throw new Error(`Rate limit exceeded. Try again in ${Math.ceil((result.reset - Date.now()) / 1000)} seconds`);
    }
    
    // Process request...
    console.log(`Request allowed. ${result.remaining} requests remaining.`);
}

Load Balancer Patterns

Distribute traffic across multiple servers to ensure high availability and optimal resource utilization.

Load Balancing Algorithms Medium Complexity

Clients

Traffic Source

Load Balancer

Distributes Traffic

Server 1

Instance 1

Server 2

Instance 2

Server 3

Instance 3

Algorithm	Description	Best For
Round Robin	Requests distributed sequentially to each server	Equal capacity servers
Least Connections	Send request to server with fewest active connections	Long-lived connections
IP Hash	Client IP determines server (sticky sessions)	Session persistence needed
Weighted Round Robin	Servers get requests proportional to their weight/capacity	Heterogeneous server capacity
Least Response Time	Forward to server with fastest response time	Performance optimization

Important: Always implement health checks to remove unhealthy servers from the pool automatically.

Caching Strategies & Patterns

Reduce latency and database load with strategic caching at multiple levels.

Multi-Level Caching Architecture High Complexity

Client-side Cache: Browser cache, CDN cache (fastest)

Reverse Proxy Cache: Nginx, Varnish (shared cache)

Application Cache: In-memory cache (Redis, Memcached)

Database Cache: Query cache, materialized views

Cache Patterns Implementation

Cache-Aside Pattern with Redis

const Redis = require('ioredis');

class CacheService {
    constructor() {
        this.redis = new Redis({
            host: process.env.REDIS_HOST,
            port: process.env.REDIS_PORT,
            retryStrategy: (times) => {
                // Exponential backoff for reconnection
                const delay = Math.min(times * 50, 2000);
                return delay;
            }
        });
    }
    
    /**
     * Cache-Aside pattern implementation
     */
    async getOrSet(key, fetchFn, ttlSeconds = 300) {
        // Try to get from cache
        let cachedData = await this.redis.get(key);
        
        if (cachedData) {
            return JSON.parse(cachedData);
        }
        
        // Cache miss - fetch from source
        const freshData = await fetchFn();
        
        // Store in cache with TTL
        await this.redis.setex(
            key,
            ttlSeconds,
            JSON.stringify(freshData)
        );
        
        return freshData;
    }
    
    /**
     * Write-Through cache pattern
     */
    async writeThrough(key, data, writeFn, ttlSeconds = 300) {
        // Write to database first
        await writeFn(data);
        
        // Then write to cache
        await this.redis.setex(
            key,
            ttlSeconds,
            JSON.stringify(data)
        );
        
        return data;
    }
    
    /**
     * Cache invalidation strategies
     */
    async invalidateCache(pattern) {
        // Delete specific key
        if (!pattern.includes('*')) {
            return await this.redis.del(pattern);
        }
        
        // Delete keys matching pattern
        const keys = await this.redis.keys(pattern);
        if (keys.length > 0) {
            return await this.redis.del(...keys);
        }
        
        return 0;
    }
    
    /**
     * Cache warming on application startup
     */
    async warmCache(keysToWarm, fetchFn) {
        const promises = keysToWarm.map(async (key) => {
            const data = await fetchFn(key);
            await this.redis.setex(key, 3600, JSON.stringify(data));
        });
        
        await Promise.all(promises);
        console.log(`Warmed cache for ${keysToWarm.length} keys`);
    }
}

// Usage example
const cache = new CacheService();

async function getUserProfile(userId) {
    return await cache.getOrSet(
        `user:${userId}:profile`,
        async () => {
            // This function is called only on cache miss
            console.log(`Cache miss for user ${userId}, fetching from DB`);
            return await UserModel.findById(userId);
        },
        600 // 10 minutes TTL
    );
}

In-Memory Cache

Redis: Rich data structures, persistence
Memcached: Simple, multi-threaded
Hazelcast: Distributed, fault-tolerant

Disk-Based Cache

Ehcache: Java, tiered storage
Couchbase: Document + cache
AWS ElastiCache: Managed Redis/Memcached

CQRS (Command Query Responsibility Segregation)

Separate read and write operations to optimize for different scalability requirements.

CQRS Architecture Overview High Complexity

Command Side

Write Operations

Event Bus

Publish Events

Read Side

Optimized for Reads

CQRS Implementation Example

CQRS with Event Sourcing

// Command side - Write model
class UserCommandService {
    constructor(eventStore, eventBus) {
        this.eventStore = eventStore;
        this.eventBus = eventBus;
    }
    
    async createUser(command) {
        const userId = uuidv4();
        
        // Create events
        const events = [
            {
                type: 'UserCreated',
                data: {
                    userId,
                    email: command.email,
                    name: command.name,
                    timestamp: new Date().toISOString()
                }
            }
        ];
        
        // Store events
        await this.eventStore.appendEvents(userId, events);
        
        // Publish events
        for (const event of events) {
            await this.eventBus.publish(event);
        }
        
        return userId;
    }
    
    async updateUserEmail(userId, newEmail) {
        const events = [
            {
                type: 'UserEmailUpdated',
                data: {
                    userId,
                    newEmail,
                    timestamp: new Date().toISOString()
                }
            }
        ];
        
        await this.eventStore.appendEvents(userId, events);
        
        for (const event of events) {
            await this.eventBus.publish(event);
        }
    }
}

// Read side - Optimized for queries
class UserReadService {
    constructor(readDatabase) {
        this.readDatabase = readDatabase;
    }
    
    async getUser(userId) {
        // Direct query to optimized read database
        return await this.readDatabase.users
            .findOne({ _id: userId })
            .lean(); // Fast read-only query
    }
    
    async searchUsers(query, page = 1, limit = 20) {
        // Complex queries on optimized schema
        const filter = {
            $or: [
                { name: { $regex: query, $options: 'i' } },
                { email: { $regex: query, $options: 'i' } }
            ]
        };
        
        return await this.readDatabase.users
            .find(filter)
            .skip((page - 1) * limit)
            .limit(limit)
            .lean();
    }
}

// Event handler to update read model
class UserProjection {
    constructor(readDatabase) {
        this.readDatabase = readDatabase;
    }
    
    async handleUserCreated(event) {
        await this.readDatabase.users.insertOne({
            _id: event.data.userId,
            email: event.data.email,
            name: event.data.name,
            createdAt: event.data.timestamp,
            updatedAt: event.data.timestamp
        });
    }
    
    async handleUserEmailUpdated(event) {
        await this.readDatabase.users.updateOne(
            { _id: event.data.userId },
            {
                $set: {
                    email: event.data.newEmail,
                    updatedAt: event.data.timestamp
                }
            }
        );
    }
}

When to use CQRS: Only when read and write workloads have significantly different requirements. Adds complexity for eventual consistency.

Building Scalable Systems

Principles and patterns for designing systems that can handle growth gracefully.

Scalability Dimensions High Complexity

Vertical Scaling: Add more resources to a single node (CPU, RAM)

Horizontal Scaling: Add more nodes to distribute load

Geographic Scaling: Distribute across regions for latency

Functional Scaling: Split by business capabilities (microservices)

Scalability Checklist

Area	Requirements	Tools/Solutions
Database	Sharding, replication, read replicas, connection pooling	MongoDB sharding, PostgreSQL read replicas, Redis Cluster
Caching	Multi-level caching, cache invalidation, CDN	Redis Cluster, Varnish, CloudFront, Akamai
Message Queue	Async processing, event-driven architecture, backpressure	Kafka, RabbitMQ, AWS SQS, Google Pub/Sub
Monitoring	Real-time metrics, distributed tracing, logging	Prometheus, Grafana, Jaeger, ELK Stack
Deployment	Blue-green, canary, rolling updates, auto-scaling	Kubernetes, Docker, AWS ECS, Terraform

Golden Rule: Design for failure. Assume everything will fail at some point and build resilience through redundancy, circuit breakers, and graceful degradation.

Real-World Architecture: E-commerce Platform

How to apply these patterns in a real e-commerce system handling 10,000 requests per second.

E-commerce Platform Architecture

Mobile App

Client

CDN

Static Assets

WAF

Security

API Gateway

Routing

Load Balancer

Traffic Dist

Cart Service

Redis Cache

Product Service

Read Replicas

Payment Service

Queue-based

Order Service

Event-Driven

Key Design Decisions

Product Catalog: Use read replicas for product queries, Redis cache for hot products

Shopping Cart: Redis for ephemeral cart data, session-based with TTL

Checkout Process: Saga pattern for distributed transactions, queue for payment processing

Inventory Management: Event sourcing for inventory changes, CQRS for reporting

Search: Elasticsearch for product search, updated via change data capture