Design an E-Commerce Platform (Amazon)

Medium

An e-commerce platform like Amazon is a complex, large-scale system that handles everything from product browsing to payment processing and order fulfillment. This design will focus on the core challenges: managing a massive product catalog, ensuring transactional integrity during checkout, and handling real-time inventory updates.

Variants:

eBayWalmartAlibaba

E-Commerce System Design Requirements

Functional Requirements

Product Catalog & Search

Users should be able to browse, search, and filter a large catalog of products.

Shopping Cart & Checkout

Users should be able to add items to a shopping cart and complete a purchase through a secure checkout process.

Inventory Management

The system must accurately track stock levels for all products in real-time to prevent overselling.

Order Management & Notifications

Users should be able to view their order history and receive notifications about order status (e.g., shipped, delivered).

Non-Functional Requirements

High Availability

The platform must be available 24/7, especially during peak shopping seasons like Black Friday. Downtime directly translates to lost revenue.

Data Consistency

Inventory and order data must be strongly consistent to avoid race conditions like overselling items or processing duplicate orders.

Scalability & Performance

The system must handle high traffic loads, with fast page load times and a responsive checkout process to minimize cart abandonment.

Security

All user data, especially payment information, must be handled securely, adhering to standards like PCI DSS.

CAP Theorem Trade-offs

Trade-off Explanation:

For core e-commerce transactions like inventory and payments, the system must prioritize Consistency and Partition Tolerance (CP). It's better to fail a transaction than to sell an out-of-stock item or double-charge a customer.

Scale Estimates

User Base200M users (2 * 10^8)

Base assumption for system sizing

Orders per second1,000 ops

Peak orders per second during a major sales event.

Total Products100M

Concurrent Users10M

API Design

GET/api/products/search

Search for products with filters and pagination.

Request Body

{ "query": "string", "category": "string", "page": 1 }

Response Body

[{ "product_id": "string", "name": "string", "price": 99.99 }]

POST/api/cart

Add an item to the current user's shopping cart.

Request Body

{ "product_id": "string", "quantity": 1 }

Response Body

{ "cart_id": "string", "item_count": 3 }

POST/api/checkout

Initiate the checkout process for the user's cart.

Request Body

{ "cart_id": "string", "payment_token": "string", "shipping_address": "{...}" }

Response Body

{ "order_id": "string", "status": "processing" }

GET/api/orders/{order_id}

Get the status and details of a specific order.

Response Body

{ "order_id": "string", "status": "shipped", "tracking_number": "string" }

Database Schema

For an e-commerce platform, data consistency is paramount, especially for orders and inventory. A relational SQL database like PostgreSQL or MySQL is an excellent choice for the core transactional database.

Products

product_id

UUIDPrimary Key

name

VARCHAR

description

TEXT

price

DECIMAL

category_id

UUID

Inventory

product_id

UUIDPrimary Key

stock_quantity

INT

version

INT

updated_at

TIMESTAMP

Orders

order_id

UUIDPrimary Key

user_id

UUID

status

VARCHAR

total_amount

DECIMAL

created_at

TIMESTAMP

Order_Items

order_item_id

UUIDPrimary Key

order_id

UUIDFK

product_id

UUIDFK

quantity

INT

price

DECIMAL

Core Services

For a system design interview, it's often best to start with a simplified, coarse-grained service architecture and then break it down further if asked. We can consolidate the core logic into two main services:

Product Catalog Service: This service is responsible for everything related to browsing and finding products. It would manage the product information, handle the complex search and filtering logic (likely by interfacing with Elasticsearch), and be responsible for checking and updating inventory levels. Consolidating these functions simplifies the "read" path of the user journey.
Order Service: This service is responsible for the entire transactional part of the platform. It manages the shopping cart, orchestrates the multi-step checkout process, integrates with third-party payment gateways like Stripe to handle payments securely, and records the final order in the database. This keeps all the high-stakes, transactional logic in one place.

Deep Dive: Inventory Management

One of the hardest problems in e-commerce is preventing two customers from buying the last item in stock at the same time (a race condition). This requires strong transactional consistency. When a user tries to check out, the system must atomically check the inventory and decrease the stock count if available.

Pessimistic Locking (Row-Level Lock)

In this approach, when a checkout process begins, we place an exclusive lock on the inventory row for each product in the cart.

BEGIN TRANSACTION;
SELECT stock_quantity FROM Inventory WHERE product_id = 'X' FOR UPDATE;
The FOR UPDATE clause locks the row. No other transaction can read or write to this row until the current transaction commits or rolls back.
If stock_quantity is sufficient, we update the count: UPDATE Inventory SET stock_quantity = stock_quantity - 1 WHERE product_id = 'X';
COMMIT; Pros: Guarantees consistency. It's impossible to oversell. Cons: Can hurt performance and increase latency, as other checkout processes for the same popular item must wait for the lock to be released. This can lead to poor user experience, especially during flash sales.

Optimistic Locking (Versioning)

This approach avoids long-lived locks. Instead, we add a version number to the inventory table.

Read the stock_quantity and version for the product.
In the application, check if the quantity is sufficient.
If it is, attempt to update the stock: UPDATE Inventory SET stock_quantity = stock_quantity - 1, version = version + 1 WHERE product_id = 'X' AND version = {read_version};
Check how many rows were affected by the update.
- If 1 row was affected, the transaction was successful.
- If 0 rows were affected, it means another transaction updated the row (and incremented the version) between our read and write. Our transaction has failed, and we must roll back and inform the user the item is out of stock. Pros: More performant and scalable than pessimistic locking as it doesn't hold locks. Cons: Can lead to higher rates of transaction failure under high contention, which might be frustrating for users.

Distributed Locking with Redis/Zookeeper

In a microservices architecture, the inventory logic might be handled by multiple instances of an Inventory Service. In this case, a simple database row-level lock isn't enough, as the contention is happening at the service layer. A distributed lock is required.

Before processing a checkout, the service instance must acquire a lock for each product_id in the cart from a distributed locking service like Redis or Zookeeper.
A common implementation uses a Redis command like SET product:lock:{product_id} "locked" NX PX 30000. The NX option means the key is only set if it doesn't already exist, making the operation atomic. The PX sets an expiry time (e.g., 30 seconds) to prevent permanent deadlocks if a service instance crashes.
If the lock is acquired successfully, the service can safely perform the read-modify-write operations on the inventory database.
After the database transaction is complete, the service must explicitly release the lock by deleting the Redis key. Pros: Provides strong consistency across a distributed system, preventing race conditions between multiple service instances. Cons: Introduces a new dependency (Redis/Zookeeper) and adds network latency to the checkout process. It's also more complex to implement correctly, with risks like deadlocks or lock-release failures that need to be handled carefully.

Deep Dive: Search & Filtering

As with Yelp, a simple SQL LIKE query is not sufficient for a modern e-commerce search experience. We need to support typo tolerance, relevance ranking, and faceted search (filtering by brand, price, size, etc.). The best tool for this is a dedicated search engine like Elasticsearch. We would use a Change Data Capture (CDC) pipeline to stream any changes from our Products table in the SQL database to an Elasticsearch cluster, keeping our search index up-to-date in near real-time.

Deep Dive: Order Notifications

Keeping customers informed about their order status is crucial for a good experience. This is a perfect use case for an asynchronous messaging system.

Push Notification Architecture

When an order's status changes in the Order Service (e.g., from "processing" to "shipped"), it publishes an event to a message queue like Apache Kafka. A dedicated Notification Service consumes these events. Based on the user's preferences and devices, this service then routes the notification to the appropriate push notification provider.

APNS (Apple Push Notification Service): For iOS devices.
FCM (Firebase Cloud Messaging): For Android devices.
Amazon SNS (Simple Notification Service): Can also be used as a single endpoint to fan out notifications to multiple platforms, including mobile push, SMS, and email. This decoupled architecture ensures that the Order Service isn't blocked by the process of sending notifications, and it allows the notification system to be scaled independently.

Complete Design

Loading diagram...