Design Uber
Variants:
Uber System Design Requirements
Functional Requirements
Rider & Driver Matching
Riders should be able to request a ride and be matched with a nearby available driver in real-time.
Real-Time Location Tracking
Riders should be able to see their driver's current location on a map from the moment the driver accepts the ride until the destination is reached.
Fare Calculation & Surge Pricing
The system must calculate the fare for a trip, including dynamic surge pricing based on real-time supply and demand.
Non-Functional Requirements
High Availability
The service must be highly available (99.9% uptime). A user not being able to request a ride is a critical failure.
Low Latency
Matching riders to drivers and updating locations must happen with very low latency (sub-200ms) to ensure a good user experience.
Scalability
The system must be able to scale to handle millions of concurrent users and rides, especially during peak hours in major cities.
CAP Theorem Trade-offs
Trade-off Explanation:
Uber prioritizes Availability and Partition Tolerance. A rider must always be able to request a ride, even if it means some data (like driver location) is slightly stale. Strict consistency is sacrificed for higher availability.
Scale Estimates
API Design
Request a new ride.
Request Body
{ "rider_id": "string", "start_location": "geo_json", "end_location": "geo_json" }
Response Body
{ "ride_id": "string", "driver_id": "string", "status": "pending", "eta": "5 mins" }
Get the status of a ride.
Response Body
{ "ride_id": "string", "driver_id": "string", "status": "in_progress", "current_location": "geo_json" }
Update a driver's location.
Request Body
{ "driver_id": "string", "location": "geo_json" }
Response Body
{ "status": "success" }
Get a fare estimate.
Request Body
{ "start_location": "geo_json", "end_location": "geo_json" }
Response Body
{ "estimate": "15.50", "surge_multiplier": 1.2 }
Fare Service, Ride Service, And Ride Database (DynamoDB)
Ride
Rider
Driver
Geospatial Database (For storing and updating driver Location)
A critical part of Uber's system is efficiently finding and tracking drivers. A standard database with latitude and longitude columns is not suitable for this task because querying for "all drivers within a 5-mile radius" would require a full table scan, which is incredibly slow with millions of drivers. We need a specialized geospatial index to handle these types of queries efficiently. Here are some popular solutions:
Geohashing
Geohashing is a popular technique that encodes a geographic coordinate into a short string of characters. The key advantage is that it turns a 2D location search into a 1D string search. All points within a given rectangular area will have a common prefix. By indexing our drivers on this geohash string, we can quickly find all drivers in a specific area by querying for this prefix. The longer the prefix, the smaller the geographical area, allowing for variable precision.
PostGIS
For those who prefer a more traditional database solution, PostGIS is a powerful open-source extension for PostgreSQL. It adds support for geographic objects and allows you to run efficient spatial queries directly in SQL (e.g., ST_DWithin
to find all points within a certain distance). It's a mature, feature-rich, and OGC-compliant solution, making it a very strong contender for a production system.
QuadTree (Popularized by other system design courses, not reccomended)
A QuadTree is a tree data structure that recursively subdivides a 2D space into four quadrants. While it's a classic data structure for spatial indexing and a common topic in computer science courses, it's often less practical for real-world, large-scale systems compared to modern alternatives. Implementing a production-ready QuadTree service is complex, and solutions like PostGIS or H3 are generally more robust and easier to manage.
Uber H3
H3 is a geospatial indexing system developed and open-sourced by Uber. It partitions the world into a grid of hexagons. Hexagons are preferable to squares because they all have neighbors that are roughly equidistant, which simplifies neighborhood analysis (e.g., finding drivers in adjacent areas). H3 provides libraries to easily convert between lat/long coordinates and H3 cell IDs, and to find neighboring cells.
Google S2
Google's S2 library is another powerful geospatial indexing system that maps the 3D sphere of the Earth onto a 2D plane using a space-filling curve. It's used extensively across Google's products, including Google Maps. It provides excellent performance for spatial queries and is designed for global scale, making it another excellent choice for a system like Uber.
Matching riders to drivers (real-time matching algorithm)
Once a rider requests a trip, the system needs to find the best-suited nearby driver. This is a complex real-time problem. A common approach is to find all available drivers within a certain radius (using our geospatial index) and then rank them based on factors like distance, driver rating, and estimated time of arrival.
To prevent a "race condition" where two riders are matched with the same driver, or one rider is offered to multiple drivers simultaneously, we can use a distributed lock. When the system selects a driver for a ride, it places a temporary lock on that driver and offers them the ride, usually with a short timeout (e.g., 5-15 seconds). If the driver accepts, the match is confirmed. If they decline or the timer expires, the lock is released, and the system moves on to the next best candidate. This ensures that each ride is offered exclusively to one driver at a time.
Surge pricing (real-time supply/demand estimation)
Surge pricing is Uber's mechanism to balance supply and demand. When demand from riders in a specific area outstrips the supply of available drivers, prices are temporarily increased to incentivize more drivers to move to that area.
To implement this, we need a real-time data pipeline. Events like ride requests, driver location updates, and trip completions are published to a message queue like Apache Kafka. A stream processing engine like Apache Flink can then consume these events.
Apache Flink Aggregations
Apache Flink allows for powerful real-time aggregations over data streams. For surge pricing, we can use tumbling windows to group events in discrete time intervals (e.g., every minute). Within each window, we can count the number of ride requests and the number of available drivers in each geographic area (defined by a geohash or H3 index). This gives us a real-time view of supply and demand.
The calculated surge multiplier for each geographic area needs to be accessed with very low latency by the fare calculation service. Storing these multipliers in a distributed in-memory cache like Redis is an ideal solution. The Flink job would continuously compute the supply/demand ratios and update the corresponding multipliers in the Redis cache in real-time.
Complete Design
Now let's put everything together. The complete system design illustrates the end-to-end flow of the Uber ride-sharing service. It starts with the rider's request, which goes through the API gateway to the ride service. The system then uses the geospatial database to find nearby drivers, the matching service to select the best driver, and the fare service (with real-time data from the surge pricing pipeline) to calculate the cost. All ride and user data is stored in a scalable NoSQL database like DynamoDB. This architecture ensures a reliable, scalable, and real-time experience for millions of users.