Design an E-Commerce Platform (Amazon)
Variants:
E-Commerce System Design Requirements
Functional Requirements
Product Catalog & Search
Users should be able to browse, search, and filter a large catalog of products.
Shopping Cart & Checkout
Users should be able to add items to a shopping cart and complete a purchase through a secure checkout process.
Inventory Management
The system must accurately track stock levels for all products in real-time to prevent overselling.
Order Management & Notifications
Users should be able to view their order history and receive notifications about order status (e.g., shipped, delivered).
Non-Functional Requirements
High Availability
The platform must be available 24/7, especially during peak shopping seasons like Black Friday. Downtime directly translates to lost revenue.
Data Consistency
Inventory and order data must be strongly consistent to avoid race conditions like overselling items or processing duplicate orders.
Scalability & Performance
The system must handle high traffic loads, with fast page load times and a responsive checkout process to minimize cart abandonment.
Security
All user data, especially payment information, must be handled securely, adhering to standards like PCI DSS.
CAP Theorem Trade-offs
Trade-off Explanation:
For core e-commerce transactions like inventory and payments, the system must prioritize Consistency and Partition Tolerance (CP). It's better to fail a transaction than to sell an out-of-stock item or double-charge a customer.
Scale Estimates
API Design
Search for products with filters and pagination.
Request Body
{ "query": "string", "category": "string", "page": 1 }
Response Body
[{ "product_id": "string", "name": "string", "price": 99.99 }]
Add an item to the current user's shopping cart.
Request Body
{ "product_id": "string", "quantity": 1 }
Response Body
{ "cart_id": "string", "item_count": 3 }
Initiate the checkout process for the user's cart.
Request Body
{ "cart_id": "string", "payment_token": "string", "shipping_address": "{...}" }
Response Body
{ "order_id": "string", "status": "processing" }
Get the status and details of a specific order.
Response Body
{ "order_id": "string", "status": "shipped", "tracking_number": "string" }
Database Schema
For an e-commerce platform, data consistency is paramount, especially for orders and inventory. A relational SQL database like PostgreSQL or MySQL is an excellent choice for the core transactional database.
Products
Inventory
Orders
Order_Items
Core Services
For a system design interview, it's often best to start with a simplified, coarse-grained service architecture and then break it down further if asked. We can consolidate the core logic into two main services:
- Product Catalog Service: This service is responsible for everything related to browsing and finding products. It would manage the product information, handle the complex search and filtering logic (likely by interfacing with Elasticsearch), and be responsible for checking and updating inventory levels. Consolidating these functions simplifies the "read" path of the user journey.
- Order Service: This service is responsible for the entire transactional part of the platform. It manages the shopping cart, orchestrates the multi-step checkout process, integrates with third-party payment gateways like Stripe to handle payments securely, and records the final order in the database. This keeps all the high-stakes, transactional logic in one place.
Deep Dive: Inventory Management
One of the hardest problems in e-commerce is preventing two customers from buying the last item in stock at the same time (a race condition). This requires strong transactional consistency. When a user tries to check out, the system must atomically check the inventory and decrease the stock count if available.
Pessimistic Locking (Row-Level Lock)
In this approach, when a checkout process begins, we place an exclusive lock on the inventory row for each product in the cart.
BEGIN TRANSACTION;
SELECT stock_quantity FROM Inventory WHERE product_id = 'X' FOR UPDATE;
- The
FOR UPDATE
clause locks the row. No other transaction can read or write to this row until the current transaction commits or rolls back. - If
stock_quantity
is sufficient, we update the count:UPDATE Inventory SET stock_quantity = stock_quantity - 1 WHERE product_id = 'X';
COMMIT;
Pros: Guarantees consistency. It's impossible to oversell. Cons: Can hurt performance and increase latency, as other checkout processes for the same popular item must wait for the lock to be released. This can lead to poor user experience, especially during flash sales.
Optimistic Locking (Versioning)
This approach avoids long-lived locks. Instead, we add a version
number to the inventory table.
- Read the
stock_quantity
andversion
for the product. - In the application, check if the quantity is sufficient.
- If it is, attempt to update the stock:
UPDATE Inventory SET stock_quantity = stock_quantity - 1, version = version + 1 WHERE product_id = 'X' AND version = {read_version};
- Check how many rows were affected by the update.
- If 1 row was affected, the transaction was successful.
- If 0 rows were affected, it means another transaction updated the row (and incremented the version) between our read and write. Our transaction has failed, and we must roll back and inform the user the item is out of stock. Pros: More performant and scalable than pessimistic locking as it doesn't hold locks. Cons: Can lead to higher rates of transaction failure under high contention, which might be frustrating for users.
Distributed Locking with Redis/Zookeeper
In a microservices architecture, the inventory logic might be handled by multiple instances of an Inventory Service. In this case, a simple database row-level lock isn't enough, as the contention is happening at the service layer. A distributed lock is required.
- Before processing a checkout, the service instance must acquire a lock for each
product_id
in the cart from a distributed locking service like Redis or Zookeeper. - A common implementation uses a Redis command like
SET product:lock:{product_id} "locked" NX PX 30000
. TheNX
option means the key is only set if it doesn't already exist, making the operation atomic. ThePX
sets an expiry time (e.g., 30 seconds) to prevent permanent deadlocks if a service instance crashes. - If the lock is acquired successfully, the service can safely perform the read-modify-write operations on the inventory database.
- After the database transaction is complete, the service must explicitly release the lock by deleting the Redis key. Pros: Provides strong consistency across a distributed system, preventing race conditions between multiple service instances. Cons: Introduces a new dependency (Redis/Zookeeper) and adds network latency to the checkout process. It's also more complex to implement correctly, with risks like deadlocks or lock-release failures that need to be handled carefully.
Deep Dive: Search & Filtering
As with Yelp, a simple SQL LIKE
query is not sufficient for a modern e-commerce search experience. We need to support typo tolerance, relevance ranking, and faceted search (filtering by brand, price, size, etc.). The best tool for this is a dedicated search engine like Elasticsearch. We would use a Change Data Capture (CDC) pipeline to stream any changes from our Products
table in the SQL database to an Elasticsearch cluster, keeping our search index up-to-date in near real-time.
Deep Dive: Order Notifications
Keeping customers informed about their order status is crucial for a good experience. This is a perfect use case for an asynchronous messaging system.
Push Notification Architecture
When an order's status changes in the Order Service (e.g., from "processing" to "shipped"), it publishes an event to a message queue like Apache Kafka. A dedicated Notification Service consumes these events. Based on the user's preferences and devices, this service then routes the notification to the appropriate push notification provider.
- APNS (Apple Push Notification Service): For iOS devices.
- FCM (Firebase Cloud Messaging): For Android devices.
- Amazon SNS (Simple Notification Service): Can also be used as a single endpoint to fan out notifications to multiple platforms, including mobile push, SMS, and email. This decoupled architecture ensures that the Order Service isn't blocked by the process of sending notifications, and it allows the notification system to be scaled independently.