Topics 1
Last updated
Last updated
Design principles:
1. Less is more - less element on the web page, fewer data queries to the database, fewer web requests, fewer system dependencies
2. Short critical path - fewer hops among services or merge into one service
3. Async processing- use message queues to handle high TPS
4. Isolation - isolate static and dynamic contents, isolate processes and databases for rare items
5. Overselling is bad. When to decrease the inventory is important
6. User experience is important. We definitely don’t want to inform users that they have successfully placed orders but later tell them no items are actually available
Here is how money moves when you click the Buy button on Amazon or any of your favorite shopping websites.
1. When a user clicks the “Buy” button, a payment event is generated and sent to the payment service.
2. The payment service stores the payment event in the database.
3. Sometimes a single payment event may contain several payment orders. For example, you may select products from multiple sellers in a single checkout process. The payment service will call the payment executor for each payment order.
4. The payment executor stores the payment order in the database.
5. The payment executor calls an external PSP to finish the credit card payment.
6. After the payment executor has successfully executed the payment, the payment service will update the wallet to record how much money a given seller has.
7. The wallet server stores the updated balance information in the database.
8. After the wallet service has successfully updated the seller’s balance information, the payment service will call the ledger to update it.
9. The ledger service appends the new ledger information to the database.
10. Every night the PSP or banks send settlement files to their clients. The settlement file contains the balance of the bank account, together with all the transactions that took place on this bank account during the day.
Stock exchanges use order books. An order book is an electronic list of buy and sell orders, organized by price levels. It has a buy book and a sell book, where each side of the book contains a bunch of price levels, and each price level contains a list of orders (first in first out). The diagram below is an example of price levels and the queued quantity at each price level.
So what happens when you place a market order to buy 2700 shares in the diagram?
- The buy order is matched with all the sell orders at price 100.10, and the first order at price 100.11 (illustrated in light red).
- Now because of the big buy order which “eats up” the first price level on the sell book, the best ask price goes up from 100.10 to 100.11.
- So when the market is bullish, people tend to buy stocks aggressively, and the price goes up and up. An efficient data structure for an order book must satisfy:
- Constant lookup time. Operations include: get volume at a price level or between price levels, query best bid/ask.
- Fast add/cancel/execute/update operations, preferably O(1) time complexity. Operations include: place a new order, cancel an order, and match an order.
How does a modern stock exchange achieve microsecond latency? The principal is:
Do less on the critical path - Fewer tasks on the critical path - Less time on each task - Fewer network hops - Less disk usage For the stock exchange, the critical path is:
start: an order comes into the order manager - mandatory risk checks - the order gets matched and the execution is sent back
end: the execution comes out of the order manager
Other non-critical tasks should be removed from the critical path. We put together a design as shown in the diagram: - deploy all the components in a single giant server (no containers)
- use shared memory as an event bus to communicate among the components, no hard disk
- key components like Order Manager and Matching Engine are single-threaded on the critical path, and each pinned to a CPU so that there is no context switch and no locks
- the single-threaded application loop executes tasks one by one in sequence
- other components listen on the event bus and react accordingly
One picture is worth more than a thousand words. Log4j from attack to prevention in one illustration.
How do we find nearby restaurants on Yelp or Google Maps? Here are some design details behind the scenes. There are two key services (see the diagram below):
There are two key services (see the diagram below):
- Business Service - Add/delete/update restaurant information
- Customers view restaurant details
- Location-based Service - Given a radius and location, return a list of nearby restaurants
How are the restaurant locations stored in the database so that LBS can return nearby restaurants efficiently?
Store the latitude and longitude of restaurants in the database? The query will be very inefficient when you need to calculate the distance between you and every restaurant.
One way to speed up the search is using the geohash algorithm.
First, divide the planet into four quadrants along with the prime meridian and equator:
- Latitude range [-90, 0] is represented by 0 - Latitude range [0, 90] is represented by 1 - Longitude range [-180, 0] is represented by 0 - Longitude range [0, 180] is represented by 1
Second, divide each grid into four smaller grids. Each grid can be represented by alternating between longitude bit and latitude bit.
So when you want to search for the nearby restaurants in the red-highlighted grid, you can write SQL like:
SELECT * FROM geohash_index WHERE geohash LIKE `01%` Geohash has some limitations. There can be a lot of restaurants in one grid (downtown New York), but none in another grid (ocean).
So there are other more complicated algorithms to optimize the process. Let me know if you are interested in the details.
In this post, let’s explore another data structure to find nearby restaurants on Yelp or Google Maps. A quadtree is a data structure that is commonly used to partition a two-dimensional space by recursively subdividing it into four quadrants (grids) until the contents of the grids meet certain criteria (see the first diagram).
A quadtree is an in-memory data structure and it is not a database solution. It runs on each LBS (Location-Based Service, see last week’s post) server, and the data structure is built at server start-up time. The second diagram explains the quadtree building process in more detail. The root node represents the whole world map. The root node is recursively broken down into 4 quadrants until no nodes are left with more than 100 businesses.
How to get nearby businesses with quadtree? - Build the quadtree in memory.
- After the quadtree is built, start searching from the root and traverse the tree, until we find the leaf node where the search origin is.
- If that leaf node has 100 businesses, return the node. Otherwise, add businesses from its neighbors until enough businesses are returned.
Update LBS server and rebuild quadtree - It may take a few minutes to build a quadtree in memory with 200 million businesses at the server start-up time.
- While the quadtree is being built, the server cannot serve traffic.
- Therefore, we should roll out a new release of the server incrementally to a small subset of servers at a time. This avoids taking a large swathe of the server cluster offline and causes service brownout.
One pro tip for acing a system design interview is to read the engineering blog of the company you are interviewing with. You can get a good sense of what technology they use, why the technology was chosen over others, and learn what issues are important to engineers.
For example, here are 4 blog posts Twitter Engineering recommends:
1. The Infrastructure Behind Twitter: Scale
2. Discovery and Consumption of Analytics Data at Twitter
3. The what and why of product experimentation at Twitter
4. Twitter experimentation: technical overview
A few weeks ago, I posted the high-level design for the payment system. Today, I’ll continue the discussion and focus on payment security. The table below summarizes techniques that are commonly used in payment security.
One of the most serious problems a payment system can have is to double charge a customer. When we design the payment system, it is important to guarantee that the payment system executes a payment order exactly-once.
At the first glance, exactly-once delivery seems very hard to tackle, but if we divide the problem into two parts, it is much easier to solve. Mathematically, an operation is executed exactly-once if:
1. It is executed at least once.
2. At the same time, it is executed at most once.
We now explain how to implement at least once using retry and at most once using idempotency check.
Retry Occasionally, we need to retry a payment transaction due to network errors or timeout. Retry provides the at-least-once guarantee. For example, as shown in Figure 10, the client tries to make a $10 payment, but the payment keeps failing due to a poor network connection. Considering the network condition might get better, the client retries the request and this payment finally succeeds at the fourth attempt.
Idempotency From an API standpoint, idempotency means clients can make the same call repeatedly and produce the same result.
For communication between clients (web and mobile applications) and servers, an idempotency key is usually a unique value that is generated by clients and expires after a certain period of time. A UUID is commonly used as an idempotency key and it is recommended by many tech companies such as Stripe and PayPal. To perform an idempotent payment request, an idempotency key is added to the HTTP header: <idempotency-key: key_value>.