Design a Chat App (WhatsApp)
Variants:
WhatsApp System Design Requirements
Functional Requirements
One-on-One Chat
Users should be able to send and receive messages in one-on-one conversations.
Group Chat
Users should be able to create and participate in group conversations.
Delivery Receipts
Users should be able to see when a message has been delivered and when it has been read.
Offline Messaging
Users should be able to receive messages even when they are offline.
Non-Functional Requirements
High Availability
The service must be highly available, with minimal downtime.
Low Latency
Messages should be delivered in near real-time.
Scalability
The system must be able to handle millions of concurrent users and billions of messages per day.
Durability
Messages should never be lost.
CAP Theorem Trade-offs
Trade-off Explanation:
For a chat application, we prioritize Availability and Partition Tolerance (AP). It's more important for users to be able to send and receive messages than to have strict consistency. Eventual consistency is acceptable, as a slight delay in message delivery is a better user experience than the service being unavailable.
Scale Estimates
API Design
The API for a chat application is different from a typical request-response model. While we'll have some standard HTTP endpoints for user management, the core of the messaging functionality will rely on a persistent connection between the client and the server. Your interviewer will expect you to discuss the trade-offs of different real-time communication protocols.
For the purpose of this section, we'll define the HTTP-based endpoints for user and chat management. The real-time messaging would be handled over a WebSocket or gRPC stream, which isn't easily represented in a standard API panel.
Register a new user
Create a new one-on-one or group chat
Get the message history for a chat
Database Schema
The database schema for a chat application needs to be optimized for fast writes (sending messages) and efficient reads (loading chat history). We'll need to store users, chats, messages, and the relationships between them.
We'll use a users
table for user information, a chats
table to store metadata about each conversation, a chat_members
table to link users to chats, and a messages
table to store the actual message content. The messages
table will be partitioned by chat_id
to ensure that all messages for a given chat are stored together for efficient retrieval.
users
chats
chat_members
messages
High-Level Architecture
A chat application's architecture is fundamentally different from a standard web application. It's a distributed system that relies on persistent connections and real-time message passing. Your interviewer will expect you to discuss the trade-offs of different real-time communication protocols and how you would ensure reliable message delivery.
Core Services
- Chat Service: This is the heart of our system. It's a stateful service that maintains a persistent connection with each online user. It's responsible for receiving messages from users and fanning them out to the other participants in a chat.
- Presence Service: This service is responsible for tracking the online status of each user. It will expose an API that the Chat Service can use to determine if a user is online and which server they are connected to.
- Push Notification Service: This service is responsible for sending push notifications to users who are offline. When the Chat Service receives a message for an offline user, it will forward it to this service, which will then send a push notification to the user's device via APNs or FCM.
Deep Dive: Delivering Messages
The core of a chat application is its ability to deliver messages in real-time. This requires a persistent connection between the client and the server. Your interviewer will expect you to discuss the different technologies that can be used to achieve this.
WebSockets
Why: WebSockets provide a full-duplex communication channel over a single TCP connection. This is the most common and efficient way to implement real-time messaging on the web.
How it works: The client establishes a WebSocket connection with the server. This connection remains open, allowing the server to push messages to the client as soon as they are received. This is much more efficient than long polling, as it avoids the overhead of constantly creating new HTTP requests.
Trade-offs:
- Pros: Low latency, efficient use of resources, full-duplex communication.
- Cons: Not supported by all older browsers and proxies.
Long Polling
Why: Long polling is a technique that simulates a server push by having the client make a request to the server that is held open until a message is available. This is a good fallback for clients that don't support WebSockets.
How it works: The client makes an HTTP request to the server. The server holds this request open until it has a message to send to the client. Once the message is sent, the client immediately makes another request.
Trade-offs:
- Pros: Supported by all browsers, simpler to implement than WebSockets.
- Cons: Higher latency than WebSockets, less efficient use of resources.
gRPC Streaming
Why: gRPC is a high-performance RPC framework that supports streaming. It's a great choice for mobile clients, as it's more efficient than WebSockets in terms of battery and network usage.
How it works: gRPC uses HTTP/2 for transport, which allows for bidirectional streaming over a single connection. The client and server can both send a stream of messages to each other. gRPC also uses Protocol Buffers for serialization, which is more efficient than JSON.
Trade-offs:
- Pros: Highly efficient, great for mobile clients, supports bidirectional streaming.
- Cons: Not natively supported by browsers (requires a proxy like Envoy), more complex to set up than WebSockets.
Deep Dive: Ensuring Users Connect to the Right Service
In a distributed chat system, a user can be connected to any one of many chat servers. When another user sends them a message, we need a way to find out which server they are connected to so we can deliver the message. Your interviewer will want to know how you would solve this service discovery problem.
Redis Pub/Sub
Why: Redis Pub/Sub is a simple and effective way to broadcast messages to all chat servers.
How it works: When a chat server receives a message, it publishes the message to a Redis channel that corresponds to the recipient's user_id
. All chat servers are subscribed to all channels. When a server receives a message on a channel, it checks if the recipient is connected to it. If so, it delivers the message.
Trade-offs:
- Pros: Simple to implement, very fast.
- Cons: Not very scalable, as every server has to process every message.
Delivery Queue
Why: A more scalable approach is to use a message queue to deliver messages to the correct server.
How it works: We can use a service discovery mechanism (like Zookeeper or a simple Redis hash) to keep track of which server each user is connected to. When a chat server receives a message, it looks up the recipient's server in the service discovery system and then enqueues the message in a queue that is specific to that server. Each server has a consumer that polls its queue for new messages.
Trade-offs:
- Pros: Highly scalable, as each server only has to process the messages that are intended for it.
- Cons: More complex to implement, introduces a single point of failure if the service discovery system goes down.
Deep Dive: Message Storage and Offline Notifications
Store-and-Forward
A key concept in any reliable messaging system is "store-and-forward." This means that when a message is sent, it is first stored in a durable data store before being forwarded to the recipient. This ensures that if the recipient is offline, or if there is a network issue, the message will not be lost. Once the recipient comes back online, they can retrieve the message from the data store.
For our message store, we'll use a NoSQL database like DynamoDB, partitioned by chat_id
for efficient retrieval of a chat's message history. All messages will be encrypted at rest to ensure user privacy.
When a message is sent to an offline user, we need to send them a push notification to let them know they have a new message. We can use a message queue like SQS to handle the delivery of these notifications. When the chat service receives a message for an offline user, it will enqueue a job in the push notification queue. A separate worker service will then dequeue the job and send the push notification via APNs (for iOS) or FCM (for Android).
Complete Design
Now that we've covered all the major components individually, let's look at how everything fits together in our complete WhatsApp system design. This diagram shows the end-to-end flow from a user sending a message to it being delivered to the recipient.
The complete architecture demonstrates how clients maintain a persistent connection to the Chat Service, which is responsible for fanning out messages to other users. The Presence Service tracks online status, and the Push Notification Service handles offline delivery. This design ensures a reliable, scalable, and real-time messaging experience.