Designing Uber63158 2019-01-03 18:39
Disclaimer: All things below are collected from public sources or purely original. No Uber-confidential stuff here.
- ride hailing service targeting the transportation markets around the world
- realtime dispatch in massive scale
- backend design
Conway’s law says structures of software systems are copies of the organization structures.
|Monolithic Service||Micro Services|
|Productivity, when teams and codebases are small||✅ High||❌ Low|
|Productivity, when teams and codebases are large||❌ Low||✅ High (Conway’s law)|
|Requirements on Engineering Quality||❌ High (under-qualified devs break down the system easily)||✅ Low (runtimes are segregated)|
|Dependency Bump||✅ Fast (centrally managed)||❌ Slow|
|Multi-tenancy support / Production-staging Segregation||✅ Easy||❌ Hard (each individual service has to either 1) build staging env connected to others in staging 2) Multi-tenancy support across the request contexts and data storage)|
|Debuggability, assuming same modules, metrics, logs||❌ Low||✅ High (w/ distributed tracing)|
|Latency||✅ Low (local)||❌ High (remote)|
|DevOps Costs||✅ Low (High on building tools)||❌ High (capacity planning is hard)|
Combining monolithic codebase and micro services can bring benefits from both sides.
- consistent hashing sharded by geohash
- data is transient, in memory, and thus there is no need to replicate. (CAP: AP over CP)
- single-threaded or locked matching in a shard to prevent double dispatching
The key is to have an async design, because payment systems usually have a very long latency for ACID transactions across multiple systems.
- leverage event queues
- payment gateway w/ Braintree, PayPal, Card.io, Alipay, etc.
- logging intensively to track everything
- APIs with idempotency, exponential backoff, and random jitter
- low latency with caching
- UserProfile Service has the challenge to serve users in increasing types (driver, rider, restaurant owner, eater, etc) and user schemas in different regions and countries.
- Apple Push Notifications Service (not quite reliable)
- Google Cloud Messaging Service GCM （it can detect the deliverability) or
- SMS service is usually more reliable