Topics 12
Last updated
Last updated
The diagram below shows the process. Note that the architectures for different databases are different, the diagram demonstrates some common designs.
Step 1 - A SQL statement is sent to the database via a transport layer protocol (e.g.TCP).
Step 2 - The SQL statement is sent to the command parser, where it goes through syntactic and semantic analysis, and a query tree is generated afterward.
Step 3 - The query tree is sent to the optimizer. The optimizer creates an execution plan.
Step 4 - The execution plan is sent to the executor. The executor retrieves data from the execution.
Step 5 - Access methods provide the data fetching logic required for execution, retrieving data from the storage engine.
Step 6 - Access methods decide whether the SQL statement is read-only. If the query is read-only (SELECT statement), it is passed to the buffer manager for further processing. The buffer manager looks for the data in the cache or data files.
Step 7 - If the statement is an UPDATE or INSERT, it is passed to the transaction manager for further processing.
Step 8 - During a transaction, the data is in lock mode. This is guaranteed by the lock manager. It also ensures the transaction’s ACID properties.
This post draws from an article published on Netflix’s engineering blog. Here’s my understanding of how the online streaming giant’s system works.
Requirements & scale
220 million users
Near real-time
Backend systems need to send notifications to various clients
Supported clients: iOS, Android, smart TVs, Roku, Amazon FireStick, web browser
The life of a push notification
Push notification events are triggered by the clock, user actions, or by systems.
Events are sent to the event management engine.
The event management engine listens to specific events and forward events to different queues. The queues are populated by priority-based event forwarding rules.
The “event priority-based processing cluster” processes events and generates push notifications data for devices.
A Cassandra database is used to store the notification data.
A push notification is sent to outbound messaging systems.
For Android, FCM is used to send push notifications. For Apple devices, APNs are used. For web, TV, and other streaming devices, Netflix’s homegrown solution called ‘Zuul Push’ is used.
Amazon DynamoDB powered 89 million requests per second during prime days. How does it do that?
The diagram below is my attempt to draw the latest architecture of DynamoDB based on the 2022 paper. I also try to explain why certain choices were made. Please leave a comment if you spot any mistakes.
REST is the most common communication standard between computers over the internet. What is it? Why is it so popular?
The common API standard used by most mobile and web applications to talk to the servers is called REST. It stands for REpresentational State Transfer.
REST is not a specification. It is a loose set of rules that has been the de facto standard for building web API since the early 2000s.
An API that follows the REST standard is called a RESTful API. Some real-life examples are Twilio, Stripe, and Google Maps.
Let’s look at the basics of REST. A RESTful API organizes resources into a set of unique URIs or Uniform Resource Identifiers.
The resources should be grouped by noun and not verb. An API to get all products should be
A client interacts with a resource by making a request to the endpoint for the resource over HTTP. The request has a very specific format.
POST /products HTTP/1.1
The line contains the URI for the resource we’d like to access. The URI is preceded by an HTTP verb which tells the server what we want to do with the resource.
You might have heard of the acronym CRUD. This is what it stands for.
In the body of these requests there could be an optional HTTP request body that contains a custom payload of data, usually encoded in JSON.
The server receives the request, processes it, and formats the result into a response.
The first line of the response contains the HTTP status code to tell the client what happened to the request.
A well-implemented RESTful API returns proper HTTP status codes.
A well-behaved client could choose to retry a failed request with a 500-level status code.
We said “could choose to retry” because some actions are not idempotent and those require extra care when retrying. When an API is idempotent, making multiple identical requests has the same effect as making a single request. This is usually not the case for a POST request to create a new resource.
The response body is optional and could contain the data payload and is usually formatted in json.
There is a critical attribute of REST that is worth discussing more.
A REST implementation should be stateless. It means that the two parties don’t need to store any information about each other and every request and response is independent from all others.
This leads to web applications that are easy to scale and well-behaved.
There are two finer points to discuss to round out a well-behaved RESTful API.
If an API endpoint returns a huge amount of data, use pagination.
A common pagination scheme uses limit and offset. Here is an example:
If they are not specified, the server should assume sensible default values.
Lastly, versioning of an API is very important.
Versioning allows an implementation to provide backward compatibility so that if we introduce breaking changes from one version to another, consumers get enough time to move to the next version.
There are many ways to version an API. The most straightforward is to prefix the version before the resource on the URI. For instance:
There are other popular API options like GraphQL and gRPC. We will discuss those and compare them separately.
If HTTPS is safe, how can tools like Fiddler capture network packets sent via HTTPS?
The diagram below shows a scenario where a malicious intermediate hijacks the packets.
Prerequisite: root certificate of the intermediate server is present in the trust-store.
Step 1 - The client requests to establish a TCP connection with the server. The request is maliciously routed to an intermediate server, instead of the real backend server. Then, a TCP connection is established between the client and the intermediate server.
Step 2 - The intermediate server establishes a TCP connection with the actual server.
Step 3 - The intermediate server sends the SSL certificate to the client. The certificate contains the public key, hostname, expiry dates, etc. The client validates the certificate.
Step 4 - The legitimate server sends its certificate to the intermediate server. The intermediate server validates the certificate.
Step 5 - The client generates a session key and encrypts it using the public key from the intermediate server. The intermediate server receives the encrypted session key and decrypts it with the private key.
Step 6 - The intermediate server encrypts the session key using the public key from the actual server and then sends it there. The legitimate server decrypts the session key with the private key.
Steps 7 and 8 - Now, the client and the server can communicate using the session key (symmetric encryption.) The encrypted data is transmitted in a secure bi-directional channel. The intermediate server can always decrypt the data.
Do you know how to explain to a 10-year-old what all the symbols/numbers on the smart credit card mean?
Do you know that smart credit cards have ISO standards? Let’s take a look:
ISO 7813: defines the card size and shape
ISO 7816: defines smart card integrated chips, such as the EMV (Europay, Mastercard, and Visa) chip
ISO 7812: defines the PAN (permanent account number) structure
ISO 7811: defines the magnetic stripe details
ISO 14443: defines contactless card
Please note those are not accurate numbers. They are based on some online benchmarks (Jeff Dean’s latency numbers + some other sources).
L1 and L2 caches: 1 ns, 10 ns E.g.: They are usually built onto the microprocessor chip. Unless you work with hardware directly, you probably don’t need to worry about them.
RAM access: 100 ns E.g.: It takes around 100 ns to read data from memory. Redis is an in-memory data store, so it takes about 100 ns to read data from Redis.
Send 1K bytes over 1 Gbps network: 10 us E.g.: It takes around 10 us to send 1KB of data from Memcached through the network.
Read from SSD: 100 us E.g.: RocksDB is a disk-based K/V store, so the read latency is around 100 us on SSD.
Database insert operation: 1 ms. E.g.: Postgresql commit might take 1ms. The database needs to store the data, create the index, and flush logs. All these actions take time.
Send packet CA->Netherlands->CA: 100 ms E.g.: If we have a long-distance Zoom call, the latency might be around 100 ms.
Retry/refresh internal: 1-10s E.g: In a monitoring system, the refresh interval is usually set to 5~10 seconds (default value on Grafana).
Notes 1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Quick quiz:
1). Do you know all? 2). Nowadays, disk and tape are used as data backup. Do you know which one has a higher write speed?
The diagram below shows a typical microservice architecture.
Load Balancer: This distributes incoming traffic across multiple API gateway instances for high availability.
CDN (Content Delivery Network): CDN is a group of geographically distributed servers that hold static content for faster delivery. The clients look for content in CDN first, then progress to backend services.
API Gateway: This handles incoming requests and routes them to the relevant services. It talks to the identity provider and service discovery.
Identity Provider: This handles authentication and authorization for users.
Service Registry & Discovery: Microservice registration and discovery happen in this component, and the API gateway looks for relevant services in this component to talk to.
Management: This component is responsible for monitoring the services.
Microservices: Microservices are designed and deployed in different domains. Each domain has its own database. The API gateway talks to the microservices via REST API or other protocols, and the microservices within the same domain talk to each other using RPC (Remote Procedure Call).
Benefits of microservices:
They can be quickly designed, deployed, and horizontally scaled.
Each domain can be independently maintained by a dedicated team.
Business requirements can be customized in each domain and better supported, as a result.
Quick questions: 1). What are the drawbacks of the microservice architecture? 2). Have you seen a monolithic system be transformed into microservice architecture? How long does it take?
Big accounts, such as Nike, Procter & Gamble & Nintendo, often cause hotspot issues for the payment system.
A hotspot payment account is an account that has a large number of concurrent operations on it.
For example, when merchant A starts a promotion on Amazon Prime day, it receives many concurrent purchasing orders. In this case, the merchant’s account in the database becomes a hotspot account due to frequent updates.
In normal operations, we put a row lock on the merchant’s balance when it gets updated. However, this locking mechanism leads to low throughput and becomes a system bottleneck.
The diagram below shows several optimizations.
Rate limit We can limit the number of requests within a certain period. The remaining requests will be rejected or retried at a later time. It is a simple way to increase the system’s responsiveness for some users, but this can lead to a bad user experience.
Split the balance account into sub-accounts We can set up sub-accounts for the merchant’s account. In this way, one update request only locks one sub-account, and the rest sub-accounts are still available.
Use cache to update balance first We can set up a caching layer to update the merchant’s balance. The detailed statements and balances are updated in the database later asynchronously. The in-memory cache can deal with a much higher throughput than the database.
Quick question: We can also put the requests into a message queue so the requests can be processed at the service’s own pace. Can you think of the limitations of this approach?
What happens behind the scenes when we shop online?
Disclaimer: I have limited knowledge of the eCommerce system. The diagram below is based on my research. Please suggest better names for the components or let me know if you spot an error.
The diagram below shows the 4 key business areas in a typical e-commerce company: procurement, inventory, eComm platform, and transportation.
1️⃣ Procurement Step 1 - The procurement department selects suppliers and manages contracts with them.
Step 2 - The procurement department places orders with suppliers, manages the return of goods, and settles invoices with suppliers.
2️⃣ Inventory Step 3 - The products or goods from suppliers are delivered to a storage facility. All products/goods are managed by inventory management systems.
3️⃣ eComm platform Steps 4-7 - The “eComm platform - Product Management” system creates the product info managed by the product system. The pricing system prices the products. Then the products are ready to be listed for sale. The promotion system defines big sale activities, coupons, etc.
Step 8-11 - Consumers can now purchase products on the e-commerce APP. First, users register or log in to the APP. Next, users browse the product list and details, adding products to the shopping cart. They then place purchasing orders.
Steps 12,13 - The order management system reserves stock in the inventory management system. Then the users pay for the product.
4️⃣ Transportation Steps 14,15 - The inventory system sends the outbound order to the transportation system, which manages the physical delivery of the goods.
Step 16 - Sign for item delivery (optional)
Quick question: If a user buys many products, their big order might be divided into several small orders based on warehouse locations, product types, etc. Where would you place the “order splitting” system in the process outlined below?