Topics 4
Last updated
Last updated
In modern architecture, systems are broken up into small and independent building blocks with well-defined interfaces between them. Message queues provide communication and coordination for those building blocks. Today, let’s discuss different delivery semantics: at-most once, at-least once, and exactly once.
𝐀𝐭-𝐦𝐨𝐬𝐭 𝐨𝐧𝐜𝐞 As the name suggests, at-most once means a message will be delivered not more than once. Messages may be lost but are not redelivered. This is how at-most once delivery works at the high level.
Use cases: It is suitable for use cases like monitoring metrics, where a small amount of data loss is acceptable.
𝐀𝐭-𝐥𝐞𝐚𝐬𝐭 𝐨𝐧𝐜𝐞 With this data delivery semantic, it’s acceptable to deliver a message more than once, but no message should be lost.
Use cases: With at-least once, messages won’t be lost but the same message might be delivered multiple times. While not ideal from a user perspective, at-least once delivery semantics are usually good enough for use cases where data duplication is not a big issue or deduplication is possible on the consumer side. For example, with a unique key in each message, a message can be rejected when writing duplicate data to the database.
𝐄𝐱𝐚𝐜𝐭𝐥𝐲 𝐨𝐧𝐜𝐞 Exactly once is the most difficult delivery semantic to implement. It is friendly to users, but it has a high cost for the system’s performance and complexity.
Use cases: Financial-related use cases (payment, trading, accounting, etc.). Exactly once is especially important when duplication is not acceptable and the downstream service or third party doesn’t support idempotency.
Question: what is the difference between message queues vs event streaming platforms such as Kafka, Apache Pulsar, etc?
You probably heard about SWIFT. What is SWIFT? What role does it play in cross-border payments? You can find answers to those questions in this post.
The Society for Worldwide Interbank Financial Telecommunication (SWIFT) is the main secure messaging system that links the world’s banks.
The Belgium-based system is run by its member banks and handles millions of payment messages per day. The diagram below illustrates how payment messages are transmitted from Bank A (in New York) to Bank B (in London).
Step 1: Bank A sends a message with transfer details to Regional Processor A in New York. The destination is Bank B.
Step 2: Regional processor validates the format and sends it to Slice Processor A. The Regional Processor is responsible for input message validation and output message queuing. The Slice Processor is responsible for storing and routing messages safely.
Step 3: Slice Processor A stores the message.
Step 4: Slice Processor A informs Regional Processor A the message is stored.
Step 5: Regional Processor A sends ACK/NAK to Bank A. ACK means a message will be sent to Bank B. NAK means the message will NOT be sent to Bank B.
Step 6: Slice Processor A sends the message to Regional Processor B in London.
Step 7: Regional Processor B stores the message temporarily.
Step 8: Regional Processor B assigns a unique ID MON (Message Output Number) to the message and sends it to Slice Processor B
Step 9: Slice Processor B validates MON.
Step 10: Slice Processor B authorizes Regional Processor B to send the message to Bank B.
Step 11: Regional Processor B sends the message to Bank B.
Step 12: Bank B receives the message and stores it.
Step 13: Bank B sends UAK/UNK to Regional Processor B. UAK (user positive acknowledgment) means Bank B received the message without error; UNK (user negative acknowledgment) means Bank B received checksum failure.
Step 14: Regional Processor B creates a report based on Bank B’s response, and sends it to Slice Processor B.
Step 15: Slice Processor B stores the report.
Step 16 - 17: Slice Processor B sends a copy of the report to Slice Processor A. Slice Processor A stores the report.
Why is Redis so fast? There are 3 main reasons as shown in the diagram below.
Redis is a RAM-based database. RAM access is at least 1000 times faster than random disk access.
Redis leverages IO multiplexing and single-threaded execution loop for execution efficiency.
Redis leverages several efficient lower-level data structures.
How can we optimize performance when we upload large files to object storage service such as S3?
Before we answer this question, let's take a look at why we need to optimize this process. Some files might be larger than a few GBs. It is possible to upload such a large object file directly, but it could take a long time. If the network connection fails in the middle of the upload, we have to start over. A better solution is to slice a large object into smaller parts and upload them independently. After all the parts are uploaded, the object store re-assembles the object from the parts. This process is called multipart upload.
The diagram below illustrates how multipart upload works:
The client calls the object storage to initiate a multipart upload.
The data store returns an uploadID, which uniquely identifies the upload.
The client splits the large file into small objects and starts uploading. Let’s assume the size of the file is 1.6GB and the client splits it into 8 parts, so each part is 200 MB in size. The client uploads the first part to the data store together with the uploadID it received in step 2.
When a part is uploaded, the data store returns an ETag, which is essentially the md5 checksum of that part. It is used to verify multipart uploads.
After all parts are uploaded, the client sends a complete multipart upload request, which includes the uploadID, part numbers, and ETags.
The data store reassembles the object from its parts based on the part number. Since the object is really large, this process may take a few minutes. After reassembly is complete, it returns a success message to the client.
What are the top caching strategies?
Read data from the system: 🔹 Cache aside 🔹 Read through
Write data to the system: 🔹 Write around 🔹 Write back 🔹 Write through
The diagram below illustrates how those 5 strategies work. Some of the caching strategies can be used together.
I left out a lot of details as that will make the post very long. Feel free to leave a comment so we can learn from each other.
Popular interview question: how to diagnose a mysterious process that’s taking too much CPU, memory, IO, etc?
The diagram below illustrates helpful tools in a Linux system.
🔹‘vmstat’ - reports information about processes, memory, paging, block IO, traps, and CPU activity.
🔹‘iostat’ - reports CPU and input/output statistics of the system.
🔹‘netstat’ - displays statistical data related to IP, TCP, UDP, and ICMP protocols.
🔹‘lsof’ - lists open files of the current system.
🔹‘pidstat’ - monitors the utilization of system resources by all or specified processes, including CPU, memory, device IO, task switching, threads, etc.
Caching is awesome but it doesn’t come without a cost, just like many things in life.
One of the issues is 𝐂𝐚𝐜𝐡𝐞 𝐌𝐢𝐬𝐬 𝐀𝐭𝐭𝐚𝐜𝐤. Please correct me if this is not the right term. It refers to the scenario where data to fetch doesn't exist in the database and the data isn’t cached either. So every request hits the database eventually, defeating the purpose of using a cache. If a malicious user initiates lots of queries with such keys, the database can easily be overloaded.
The diagram below illustrates the process.
Two approaches are commonly used to solve this problem:
🔹Cache keys with null value. Set a short TTL (Time to Live) for keys with null value.
🔹Using Bloom filter. A Bloom filter is a data structure that can rapidly tell us whether an element is present in a set or not. If the key exists, the request first goes to the cache and then queries the database if needed. If the key doesn't exist in the data set, it means the key doesn’t exist in the cache/database. In this case, the query will not hit the cache or database layer.
Understanding the tradeoffs is very important not only in system design interviews but also designing real-world systems. When we talk about data replication, there is a fundamental tradeoff between latency and consistency. It is illustrated by the diagram below.
Optimistic locking, also referred to as optimistic concurrency control, allows multiple concurrent users to attempt to update the same resource.
There are two common ways to implement optimistic locking: version number and timestamp. Version number is generally considered to be a better option because the server clock can be inaccurate over time. We explain how optimistic locking works with version number.
The diagram below shows a successful case and a failure case.
A new column called “version” is added to the database table.
Before a user modifies a database row, the application reads the version number of the row.
When the user updates the row, the application increases the version number by 1 and writes it back to the database.
A database validation check is put in place; the next version number should exceed the current version number by 1. The transaction aborts if the validation fails and the user tries again from step 2.
Optimistic locking is usually faster than pessimistic locking because we do not lock the database. However, the performance of optimistic locking drops dramatically when concurrency is high.
To understand why, consider the case when many clients try to reserve a hotel room at the same time. Because there is no limit on how many clients can read the available room count, all of them read back the same available room count and the current version number. When different clients make reservations and write back the results to the database, only one of them will succeed, and the rest of the clients receive a version check failure message. These clients have to retry. In the subsequent round of retries, there is only one successful client again, and the rest have to retry. Although the end result is correct, repeated retries cause a very unpleasant user experience.
Popular interview question - what are the differences between Redis and Memcached?
The diagram below illustrates the key differences.
The advantages of data structures make Redis a good choice for:
🔹 Recording the number of clicks and comments for each post (hash)
🔹 Sorting the commented user list and deduping the users (zset)
🔹 Caching user behavior history and filtering malicious behaviors (zset, hash)
🔹 Storing boolean information of extremely large data into small space. For example, login status, membership status. (bitmap)