Topics 19
Last updated
Last updated
We have helped over 100,000 engineers level up their system design skills in the past few years.
Here is a simple but powerful, step-by-step framework we put together to help you crack system design interviews.
The Framework: Step 1 - Understand the problem and establish the design scope Step 2 - Propose high-level design and get buy-in Step 3 - Design deep dive Step 4 - Wrap up
In this 10 minutes video, we go deeper into each step:
Why do we need a framework
What to focus on
How to discuss trade-offs
Dos and Don'ts
Common mistakes to avoid
Tips:
Don't go to the deep details too early
Don't mention specific technology at the high-level design step
Choose different databases only after the data model and analyze the data access pattern
In deep dive, pick 1, 2 problem to dive deep and list the options / tradeoffs for each options, at least two options
Caching is one of the 𝐦𝐨𝐬𝐭 𝐜𝐨𝐦𝐦𝐨𝐧𝐥𝐲 used techniques when building fast online systems. When using a cache, here are the top 5 things to consider:
The first version of the cheatsheet was written by guest author Love Sharma.
𝐒𝐮𝐢𝐭𝐚𝐛𝐥𝐞 𝐒𝐜𝐞𝐧𝐚𝐫𝐢𝐨𝐬:
In-memory solution
Read heavy system
Data is not frequently updated
𝐂𝐚𝐜𝐡𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬:
Cache aside
Write-through
Read-through
Write-around
Write-back
𝐂𝐚𝐜𝐡𝐞 𝐄𝐯𝐢𝐜𝐭𝐢𝐨𝐧 𝐀𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦𝐬:
Least Recently Used (LRU)
Least Frequently Used (LFU)
First-in First-out (FIFO)
Random Replacement (RR)
𝐊𝐞𝐲 𝐌𝐞𝐭𝐫𝐢𝐜𝐬:
Cache Hit Ratio
Latency
Throughput
Invalidation Rate
Memory Usage
CPU usage
Network usage
𝐎𝐭𝐡𝐞𝐫 𝐢𝐬𝐬𝐮𝐞𝐬:
Thunder herd on cold start
Time-to-live (TTL)
Immutability here means that once data is written into Git, it cannot be changed. Modifications only create new data versions. The old data remains unchanged.
Immutable system designs are commonly used in systems that require high levels of auditability, such as financial systems and version control systems. Here's how it's used in Git design:
🔹Users' local Git storage consists of three sections: working copy, staging area, and local repository. 🔹Working copies contain the files you are currently working on. The data is mutable, so you can do whatever you want with it 🔹When you type "git add", your files will be added to the staging area. These files are now immutable. It is no longer possible to edit them 🔹When you type "git commit", your staging files are added to the local repository. Local repository is a tree version of the append-only write-ahead log (WAL). They are both immutable: you can only append to the end of the data structure. 🔹When you type "git push", your local repository data will be synced to the remote repository. As the remote repository uses the same data structure as your local repository, it is also immutable: you can only add data to it
👉 Over to you: there are two ways to save the history of a file: either save every version of the file, or save every delta change to the file and reconstruct the file by aggregating all delta changes. Would you recommend Git using one over another, and why?
The diagram below shows how we can use popular AI writers.
In general, the tools cover the workflow of copywriting and content creation. They can choose topics, write content and optimize the content.
For example, we can first use SurferSEO to extract keywords and topics and then use Jasper or writesonic to generate marketing content.
If we want to customize the tone for different audiences, we can use wordtune to paraphrase the articles.
ChatGPT-like AI writers gain so much popularity because AI tools finally generate revenue!
Most people think Redis is just for caching.
But Redis can do so much more than that. It is good for:
Session store
Distributed lock
Counter
Rate limiter
Ranking/leaderboard
etc.
In this video, we get insights into how Redis solves interesting scalability challenges and learn why it is a great tool to know well in our system design toolset.
Game Leaderboard with sorted set
The diagram below shows 6 common algorithms.
Static Algorithms
Round robin The client requests are sent to different service instances in sequential order. The services are usually required to be stateless.
Sticky round-robin This is an improvement of the round-robin algorithm. If Alice’s first request goes to service A, the following requests go to service A as well.
Weighted round-robin The admin can specify the weight for each service. The ones with a higher weight handle more requests than others.
Hash This algorithm applies a hash function on the incoming requests’ IP or URL. The requests are routed to relevant instances based on the hash function result.
Dynamic Algorithms
Least connections A new request is sent to the service instance with the least concurrent connections.
Least response time A new request is sent to the service instance with the fastest response time.
👉 Over to you:
Which algorithm is most popular?
We can use other attributes for hashing algorithms. For example, HTTP header, request type, client type, etc. What attributes have you used?
Think you know how VPNs work? Think again! 😳 It's so complex.
The architecture of a potential experiment platform is depicted in the diagram below. This content of the visual is from the book: "Trustworthy Online Controlled Experiments" (redrawn by me). The platform contains 4 high-level components.
Experiment definition, setup, and management via a UI. They are stored in the experiment system configuration.
Experiment deployment to both the server and client-side (covers variant assignment and parameterization as well).
Experiment instrumentation.
Experiment analysis.
The book's author Ronny Kohavi also teaches a live Zoom class on Accelerating Innovation with A/B Testing. The class focuses on concepts, culture, trust, limitations,
After implementing the primary-replica architecture, most applications should be able to scale to several hundred thousand users, and some simple applications might be able to reach a million users.
However, for some read-heavy applications, primary-replica architecture might not be able to handle traffic spikes well. For our e-commerce example, flash sale events like Black Friday sales in the United States could easily overload the databases. If the load is sufficiently heavy, some users might not even be able to load the sales page.
The next logical step to handle such situations is to add a cache layer to optimize the read operations.
Redis is a popular in-memory cache for this purpose. Redis reduces the read load for a database by caching frequently accessed data in memory. This allows for faster access to the data since it is retrieved from the cache instead of the slower database. By reducing the number of read operations performed on the database, Redis helps to reduce the load on the database cluster and improve its overall scalability. As summarized below by Jeff Dean et al, in-memory access is 1000X faster than disk access.
For our example application, we deploy the cache using the read-through caching strategy. With this strategy, data is first checked in the cache before being read from the database. If the data is found in the cache, it is returned immediately, otherwise, it is loaded from the database and stored in the cache for future use.
There are other cache strategies and operational considerations when deploying a caching layer at scale. For example, with another copy of data stored in the cache, we have to maintain data consistency. We will have a deep dive series on caching soon to explore this topic in much greater detail.
There is another class of application data that is highly cacheable: the static contents for the application, such as images, videos, style sheets, and application bundles, which are infrequently updated. They should be served by a Content Delivery Network (CDN).
A CDN serves the static content from a network of servers located closer to the end user, reducing latency, and improving the loading speed of the web pages. This results in a better user experience, especially for users located far away from the application server.
A cache layer can provide some relief for read-heavy applications. However, as we continue to scale, the amount of write requests will start to overload the single primary database. This is when it might make sense to shard the primary database.
There are two ways to shard a database: horizontally or vertically.
Horizontal sharding is more common. It is a database partitioning technique that divides data across multiple database servers based on the values in one or more columns of a table. For example, a large user table can be partitioned based on user ID. It results in multiple smaller tables stored on separate database servers, with each handling a small subset of the rows that were previously handled by the single primary database.
Vertical sharding is less common. It separates tables or parts of a table into different database servers based on the specific needs of the application. This optimizes the application based on specific access patterns of each column.
Database sharding has some significant drawbacks.
First, sharding adds complexity to the application and database layers. Data must be partitioned and distributed across multiple databases, making it difficult to ensure data consistency and integrity.
Second, sharding introduces performance overhead, increasing application latency, especially for operations that require data from multiple shards
First, let's clarify some concepts before discussing the differences.
NLB (Network Load Balancer) is usually deployed before the API gateway, handling traffic routing based on IP. It does not parse the HTTP requests.
ALB (Application Load Balancer) routes requests based on HTTP header or URL and thus can provide richer routing rules. We can choose the load balancer based on routing requirements. For simple services with a smaller scale, one load balancer is enough.
The API gateway performs tasks more on the application level. So it has different responsibilities from the load balancer.
The diagram below shows the detail. Often, they are used in combination to provide a scalable and secure architecture for modern web apps.
Option a: ALB is used to distribute requests among different services. Due to the fact that the services implement their own rating limitation, authentication, etc., this approach is more flexible but requires more work at the service level.
Option b: An API gateway takes care of authentication, rate limiting, caching, etc., so there is less work at the service level. However, this option is less flexible compared with the ALB approach.
A picture is worth a thousand words. ChatGPT seems to come out of nowhere. Little did we know that it was built on top of decades of research.
The diagram below shows how we get here.
1950s In this stage, people still used primitive models that are based on rules.
1980s Since the 1980s, machine learning started to pick up and was used for classification. The training was conducted on a small range of data.
1990s - 2000s Since the 1990s, neural networks started to imitate human brains for labeling and training. There are generally 3 types:
CNN (Convolutional Neural Network): often used in visual-related tasks.
RNN (Recurrent Neural Network): useful in natural language tasks
GAN (Generative Adversarial Network): comprised of two networks(Generative and Discriminative). This is a generative model that can generate novel images that look alike.
2017 “Attention is all you need” represents the foundation of generative AI. The transformer model greatly shortens the training time by parallelism.
2018 - Now In this stage, due to the major progress of the transformer model, we see various models train on a massive amount of data. Human demonstration becomes the learning content of the model. We’ve seen many AI writers that can write articles, news, technical docs, and even code. This has great commercial value as well and sets off a global whirlwind.
YouTube handles 500+ hours of video content uploads every minute on average. How does it manage this?
The diagram below shows YouTube’s innovative hardware encoding published in 2021.
Traditional Software Encoding
YouTube’s mission is to transcode raw video into different compression rates to adapt to different viewing devices - mobile(720p), laptop(1080p), or high-resolution TV(4k).
Creators upload a massive amount of video content on YouTube every minute. Especially during the COVID-19 pandemic, video consumption is greatly increased as people are sheltered at home. Software-based encoding became slow and costly. This means there was a need for a specialized processing brain tailored made for video encoding/decoding.
YouTube’s Transcoding Brain - VCU
Like GPU or TPU was used for graphics or machine learning calculations, YouTube developed VCU (Video transCoding Unit) for warehouse-scale video processing.
Each cluster has a number of VCU accelerated servers. Each server has multiple accelerator trays, each containing multiple VCU cards. Each card has encoders, decoders, etc. [1]
VCU cluster generates video content with different resolutions and stores it in cloud storage.
This new design brought 20-33x improvements in computing efficiency compared to the previous optimized system. [2]
Over to you: Why is a specialized chip so much faster than a software-based solution?
A guest post by Love Sharma. The link to the complete article can be found in the comment section.
This article is written by guest author Love Sharma. You can read the full article here.
CDNs are distributed server networks that help improve the performance, reliability, and security of content delivery on the internet.
Here is the Overall CDN Diagram explains:
Edge servers are located closer to the end user than traditional servers, which helps reduce latency and improve website performance.
Edge computing is a type of computing that processes data closer to the end user rather than in a centralized data center. This helps to reduce latency and improve the performance of applications that require real-time processing, such as video streaming or online gaming.
Cloud gaming is online gaming that uses cloud computing to provide users with high-quality, low-latency gaming experiences.
Together, these technologies are transforming how we access and consume digital content. By providing faster, more reliable, and more immersive experiences for users, they are helping to drive the growth of the digital economy and create new opportunities for businesses and consumers alike.
In the first two parts of this series, we explored the traditional approach to building and scaling an application. It started with a single server that ran everything, and gradually evolved to a microservice architecture that could support millions of daily active users.
In the final two parts of this series, we examine the impact of recent trends like cloud and serverless computing, along with the proliferation of client application frameworks and the associated developer ecosystem. We explore how these trends alter the way we build applications, especially for early-stage startups where time-to-market is critical, and provide valuable insights on how to incorporate these modern approaches when creating your next big hit.
Let’s start by briefly explaining these computing trends we mentioned.
The first trend is cloud computing. Cloud computing, in its most basic form, is running applications on computing resources managed by cloud providers. When using cloud computing, we do not have to purchase or manage hardware ourselves.
The second trend is serverless computing. Serverless computing builds on the convenience of cloud computing with even more automation. It enables developers to build and run applications without having to provision cloud servers. The serverless provider handles the infrastructure and automatically scales the computing resources up or down as needed. This provides a great developer experience since developers can focus on the application code itself, without having to worry about scaling.
The third trend rides on the waves of the first two. It is the proliferation of the client application frameworks and the frontend hosting platforms that make deploying these frontend applications effortless.