Topics 5
Last updated
Last updated
How do modern browsers work?
Google published a series of articles about "Inside look at modern web browser". It's a great read.
https://developer.chrome.com/blog/inside-browser-part1/ https://developer.chrome.com/blog/inside-browser-part2/ https://developer.chrome.com/blog/inside-browser-part3/ https://developer.chrome.com/blog/inside-browser-part4/
The diagram below illustrates the API timeline and API styles comparison.
Over time, different API architectural styles are released. Each of them has its own patterns of standardizing data exchange.
You can check out the use cases of each style in the diagram.
Why is Kafka fast?
Kafka achieves low latency message delivery through Sequential I/O and Zero Copy Principle. The same techniques are commonly used in many other messaging/streaming platforms.
The diagram below illustrates how the data is transmitted between producer and consumer, and what zero-copy means.
🔹Step 1.1 - 1.3: Producer writes data to the disk
🔹Step 2: Consumer reads data without zero-copy 2.1: The data is loaded from disk to OS cache 2.2 The data is copied from OS cache to Kafka application 2.3 Kafka application copies the data into the socket buffer 2.4 The data is copied from socket buffer to network card 2.5 The network card sends data out to the consumer
🔹Step 3: Consumer reads data with zero-copy 3.1: The data is loaded from disk to OS cache 3.2 OS cache directly copies the data to the network card via sendfile() command 3.3 The network card sends data out to the consumer
Zero copy is a shortcut to save the multiple data copies between application context and kernel context. This approach brings down the time by approximately 65%.
How to scale a website to support millions of users? We will explain this step-by-step.
The diagram below illustrates the evolution of a simplified eCommerce website. It goes from a monolithic design on one single server, to a service-oriented/microservice architecture.
Suppose we have two services: inventory service (handles product descriptions and inventory management) and user service (handles user information, registration, login, etc.).
Step 1 - With the growth of the user base, one single application server cannot handle the traffic anymore. We put the application server and the database server into two separate servers.
Step 2 - The business continues to grow, and a single application server is no longer enough. So we deploy a cluster of application servers.
Step 3 - Now the incoming requests have to be routed to multiple application servers, how can we ensure each application server gets an even load? The load balancer handles this nicely.
Step 4 - With the business continuing to grow, the database might become the bottleneck. To mitigate this, we separate reads and writes in a way that frequent read queries go to read replicas. With this setup, the throughput for the database writes can be greatly increased.
Step 5 - Suppose the business continues to grow. One single database cannot handle the load on both the inventory table and user table. We have a few options:
Vertical partition. Adding more power (CPU, RAM, etc.) to the database server. It has a hard limit.
Horizontal partition by adding more database servers.
Adding a caching layer to offload read requests.
Step 6 - Now we can modularize the functions into different services. The architecture becomes service-oriented / microservice.
HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC).
What problem does each generation of HTTP solve?
The diagram below illustrates the key features.
🔹HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.
🔹HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesn’t solve the HOL (head-of-line) blocking issue.
HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.
🔹HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.
As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP “streams”: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesn’t need to be sent in order.
🔹HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.
QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.
Serverless is one of the hottest topics in cloud services. How does AWS Lambda work behind the scenes?
Lambda is a serverless computing service provided by Amazon Web Services (AWS), which runs functions in response to events.
Firecracker MicroVM Firecracker is the engine powering all of the Lambda functions [1]. It is a virtualization technology developed at Amazon and written in Rust.
The diagram below illustrates the isolation model for AWS Lambda Workers.
Lambda functions run within a sandbox, which provides a minimal Linux userland, some common libraries and utilities. It creates the Execution environment (worker) on EC2 instances.
How are lambdas initiated and invoked? There are two ways.
Synchronous execution Step1: "The Worker Manager communicates with a Placement Service which is responsible to place a workload on a location for the given host (it’s provisioning the sandbox) and returns that to the Worker Manager" [2].
Step 2: "The Worker Manager can then call Init to initialize the function for execution by downloading the Lambda package from S3 and setting up the Lambda runtime" [2]
Step 3: The Frontend Worker is now able to call Invoke [2].
Asynchronous execution Step 1: The Application Load Balancer forwards the invocation to an available Frontend which places the event onto an internal queue(SQS). Step 2: There is "a set of pollers assigned to this internal queue which are responsible for polling it and moving the event onto a Frontend synchronously. After it’s been placed onto the Frontend it follows the synchronous invocation call pattern which we covered earlier" [2].
Question: Can you think of any use cases for AWS Lambda?
Why is a solid-state drive (SSD) fast?
“A solid state drive reads up to 10 times faster and writes up to 20 times faster than a hard disk drive.” [1].
“An SSD is a flash-memory based data storage device. Bits are stored into cells, which are made of floating-gate transistors. SSDs are made entirely of electronic components, there are no moving or mechanical parts like in hard drives (HDD)” [2].
The diagram below illustrates the SSD architecture.
Step 1: “Commands come from the user through the host interface” [2]. The interface can be Serial ATA (SATA) or PCI Express (PCIe).
Step 2: “The processor in the SSD controller takes the commands and passes them to the flash controller” [2].
Step 3: “SSDs also have embedded RAM memory, generally for caching purposes and to store mapping information” [2].
Step 4: “The packages of NAND flash memory are organized in gangs, over multiple channels” [2].
The second diagram illustrates how the logical and physical pages are mapped, and why this architecture is fast.
SSD controller operates multiple FLASH particles in parallel, greatly improving the underlying bandwidth. When we need to write more than one page, the SSD controller can write them in parallel [3], whereas the HDD has a single head and it can only read from one head at a time.
Every time a HOST Page is written, the SSD controller finds a Physical Page to write the data and this mapping is recorded. With this mapping, the next time HOST reads a HOST Page, the SSD knows where to read the data from FLASH [3].
Which cloud provider should be used when building a big data solution?
The diagram below illustrates the detailed comparison of AWS, Google Cloud, and Microsoft Azure.
The common parts of the solutions:
Data ingestion of structured or unstructured data.
Raw data storage.
Data processing, including filtering, transformation, normalization, etc.
Data warehouse, including key-value storage, relational database, OLAP database, etc.
Presentation layer with dashboards and real-time notifications.
It is interesting to see different cloud vendors have different names for the same type of products.
For example, the first step and the last step both use the serverless product. The product is called “lambda” in AWS, and “function” in Azure and Google Cloud.
Question - which products have you used in production? What kind of application did you use it for?
What are the differences between Virtualization (VMware) and Containerization (Docker)?
The diagram below illustrates the layered architecture of virtualization and containerization.
“Virtualization is a technology that allows you to create multiple simulated environments or dedicated resources from a single, physical hardware system” [1].
“Containerization is the packaging together of software code with all its necessary components like libraries, frameworks, and other dependencies so that they are isolated in their own "container" [2].
The major differences are:
🔹 In virtualization, the hypervisor creates an abstraction layer over hardware, so that multiple operating systems can run alongside each other. This technique is considered to be the first generation of cloud computing.
🔹Containerization is considered to be a lightweight version of virtualization, which virtualizes the operating system instead of hardware. Without the hypervisor, the containers enjoy faster resource provisioning. All the resources (including code, dependencies) that are needed to run the application or microservice are packaged together, so that the applications can run anywhere.
Question: how much performance differences have you observed in production between virtualization, containerization, and bare-metal?
In our 4-step system design interview framework, the second step is “Propose High-level Design and Get Buy-in”. The objective of this step is to come up with a high-level design diagram for the problem at hand and establish a common ground for further exploration.
It is a red flag to get carried away this early in the session with premature optimizations. For example, some candidates love to talk about caching and sharding in this step.
Optimizations are complicated and expensive. They require solid justifications for the added complexity. At this early stage, the justifications are often hand-wavy. The added complexity distracts the interviewer and prevents them from understanding the high-level design.
Focus on the task at hand. If you find yourself getting distracted by optimization ideas, table them. Make a list of ideas to revisit in the deep dive section.
Keep the high-level design simple. This step should take about 15 minutes.
What are some other premature optimizations that can be tabled?