Leaky bucket/Token bucket, GeoHash, Quadtree, Leader election, Consensus

11. Leaky bucket / Token bucket

The leaky bucket and token bucket algorithms are two different approaches to traffic shaping and rate limiting in computer networks.

The leaky bucket algorithm enforces a steady output rate by allowing input traffic to enter a bucket at a variable rate , but the bucket has a fixed capacity. When the bucket overflows, excess traffic is discarded. The average output rate is controlled by the rate at which the bucket leaks.

Leaky Bucket

The Leaky Bucket algorithm is a rate-limiting algorithm used to control the rate of data transmission over a network. The algorithm works by collecting data packets into a buffer, which is then processed and transmitted at a fixed rate. The buffer acts like a bucket that collects the incoming packets, and water leaks out of the bucket at a fixed rate, limiting the amount of data that can be transmitted.

Here's how the algorithm works:

Data packets are added to a buffer or queue as they arrive.
A clock ticks at regular intervals, representing the passage of time.
At each tick, the bucket leaks a fixed amount of data, equivalent to the maximum rate at which data can be transmitted.
If there is no data in the buffer, the bucket remains empty, and no data is transmitted.
If there is data in the buffer, the bucket leaks the fixed amount of data at each tick, transmitting packets in the order they were received.
If the buffer fills up and new packets arrive, they are dropped.

The Leaky Bucket algorithm ensures that the data transmission rate is consistent and does not exceed a predefined limit. It also helps to prevent network congestion by limiting the amount of data that can be transmitted at any given time.

One disadvantage of the Leaky Bucket algorithm is that it can result in data transmission delays, as packets are held in the buffer until they can be transmitted at the fixed rate. It may also not be ideal for real-time applications where data needs to be transmitted as soon as it arrives.

public class LeakyBucket {
    private final int capacity;
    private final int rate;
    private int size;
    private long lastLeakTime;

    public LeakyBucket(int capacity, int rate) {
        this.capacity = capacity;
        this.rate = rate;
        this.size = 0;
        this.lastLeakTime = System.currentTimeMillis();
    }

    public synchronized boolean add(int amount) {
        leak();
        if (size + amount > capacity) {
            return false; // Bucket is full, packet is dropped
        } else {
            size += amount;
            return true; // Packet is accepted
        }
    }

    private void leak() {
        long now = System.currentTimeMillis();
        long timeElapsed = now - lastLeakTime;
        int amountToLeak = (int) (timeElapsed * rate / 1000);
        if (amountToLeak > 0) {
            size = Math.max(0, size - amountToLeak);
            lastLeakTime = now;
        }
    }
}

Token bucket

The token bucket algorithm also controls traffic based on a fixed output rate, but instead of using a bucket to control the input, it uses tokens. A token bucket is initially filled with a fixed number of tokens, and when a packet arrives, a token is removed from the bucket to allow it to be transmitted. If there are no tokens in the bucket, packets are queued or discarded.

Both algorithms can be used to smooth out traffic and prevent network congestion by enforcing a maximum rate of traffic. They are often used in combination with other techniques such as Quality of Service (QoS) and traffic classification to control network traffic and prioritize certain types of traffic over others.

The token bucket algorithm is a simple algorithm that can be used for rate limiting in distributed systems. It works by maintaining a bucket of tokens, where each token represents a unit of work, such as an API request or a database query. The bucket has a fixed capacity, and tokens are added to the bucket at a fixed rate. When a client wants to perform an action that requires a token, it must first acquire a token from the bucket. If the bucket is empty, the client must wait until a token becomes available.

Here's how the token bucket algorithm works:

Initialize the token bucket: The token bucket has a fixed capacity, which represents the maximum number of tokens that can be stored in the bucket at any given time. The bucket is initially empty, and tokens are added to the bucket at a fixed rate.
Add tokens to the bucket: At a fixed interval, tokens are added to the bucket at a fixed rate. For example, if the rate is 10 tokens per second, one token will be added to the bucket every 100 milliseconds.
Acquire a token: When a client wants to perform an action that requires a token, it must first acquire a token from the bucket. If the bucket is empty, the client must wait until a token becomes available.
Perform the action: Once the client has acquired a token, it can perform the action that requires a token, such as making an API request or a database query.
Repeat: The client can repeat the process of acquiring a token and performing the action as needed.

The token bucket algorithm has several advantages:

It is simple to implement and understand.
It allows bursts of requests, as long as the bucket has tokens available.
It provides a fixed rate of requests over time.

However, it also has some limitations:

It does not account for varying request sizes or complexities.
It may not be suitable for scenarios where strict concurrency limits are required.
It can waste tokens if they are not used, as they expire and are replaced by newer tokens.

public class TokenBucket {
    private int capacity;
    private int tokens;
    private long lastRefillTimestamp;
    private long refillInterval;

    public TokenBucket(int capacity, int tokensPerSecond) {
        this.capacity = capacity;
        this.tokens = capacity;
        this.refillInterval = 1000L / tokensPerSecond;
        this.lastRefillTimestamp = System.currentTimeMillis();
    }

    public synchronized boolean tryConsume() {
        refill();
        if (tokens > 0) {
            tokens--;
            return true;
        } else {
            return false;
        }
    }

    private void refill() {
        long now = System.currentTimeMillis();
        long timeSinceLastRefill = now - lastRefillTimestamp;
        int tokensToAdd = (int) (timeSinceLastRefill / refillInterval);
        if (tokensToAdd > 0) {
            tokens = Math.min(capacity, tokens + tokensToAdd);
            lastRefillTimestamp = now;
        }
    }
}

12. Geohash

Geohash is a geocode system invented in 2008 by Gustavo Niemeyer that encodes a geographic location into a short string of letters and digits . The encoding represents a rectangular bounding box that contains the location and is recursively subdivided into smaller and smaller cells, each identified by a unique geohash code. The more characters included in the geohash, the more precisely it defines the location. Geohashes are used in applications that require location-based searches, such as mapping and location-based services. They are also used for indexing and querying geospatial data in databases and search engines.

13. Quadtree

A quadtree is a tree data structure used in computer graphics , geographic information systems (GIS), and other applications that require efficient spatial indexing and querying of 2D or 3D points, rectangles or other geometric primitives. A quadtree recursively subdivides a 2D or 3D space into four equally sized quadrants or regions, each represented by a quadtree node. Each node can be further subdivided until a leaf node is reached. The leaf node contains the actual data or objects that are being indexed and queried. The quadtree provides a fast and efficient way of searching for points or objects within a rectangular area or for nearest neighbors. It is commonly used in GIS for indexing maps, as well as in computer graphics for collision detection, frustum culling, and other spatial operations. The quadtree is a special case of a recursive partitioning tree, where each internal node is divided into a fixed number of subregions or children.

Quadtrees are trees used to efficiently store data of points on a two-dimensional space. In this tree, each node has at most four children. We can construct a quadtree from a two-dimensional area using the following steps:

Divide the current two dimensional space into four boxes.
If a box contains one or more points in it, create a child object, storing in it the two dimensional space of the box
If a box does not contain any points, do not create a child for it
Recurse for each of the children.

There are some differences to consider when choosing between using a geohash or a quadtree for spatial indexing and searching:

Precision: Geohashes provide variable precision by adjusting the length of the hash string, while a quadtree subdivides the space into fixed-sized cells. Thus, geohashes are more flexible in terms of precision but may result in more computational overheads.
Indexing Performance: Quadtree is the more efficient structure in terms of indexing performance, as it guarantees that each cell in the space is covered by at most one node. This property directly translates to faster traversal times and lower memory usage, resulting in faster indexing of spatial data.
Search Performance: Geohash is faster for point search as each cell is represented by a unique hash value. This means that, in principle, the search query only needs to match against the hash values to identify relevant nodes. On the other hand, quadtree may be more efficient for range search as it allows for direct traversal of relevant cells. Overall, the choice between geohash and quadtree is dependent on the use case and the specific spatial requirements of the application.

14. Leader election

In distributed computing, leader election is the process of designating a single process as the coordinator of a task distributed among several computers or nodes . The goal of leader election is to break the symmetry among nodes and select a unique leader node that will organize and coordinate the task. Leader election algorithms are designed to be efficient and economical in terms of message passing and computation. There are several strategies for leader election, including comparisons of unique and comparable identities, randomized leader election, and the use of secret sharing or quorum systems. Leader election is a fundamental problem in distributed computing and has applications in various fields, including cloud computing, blockchain, and distributed databases.

There are several algorithms that can be employed to select a leader in such systems, and I shall mention a few:

Bully Algorithm: In this algorithm, a process with the highest priority (usually determined by process ID) becomes the leader. When a process detects the absence of a leader, it initiates an election by sending an “Election” message to processes with higher priority. If it receives no response, it declares itself the leader and sends a “Victory” message to all lower-priority processes.
Ring Algorithm: In this approach, processes are arranged in a logical ring. When a process detects the absence of a leader, it initiates an election by sending an “Election” message containing its process ID to its neighbor in the ring. Each process forwards the message, updating the process ID if it encounters a higher one. When the message returns to the initiator, the process with the highest ID is declared the leader.
Paxos Algorithm: This algorithm is based on a consensus protocol that ensures agreement among processes in the presence of failures. It involves three roles: proposers, acceptors, and learners. Proposers suggest new leader values, acceptors vote on these values, and learners learn the chosen value. The algorithm guarantees that only one value is chosen, and that value is the new leader.
Raft Algorithm: Raft is a consensus algorithm designed to be more understandable than Paxos. It divides time into terms, and each term has a leader. The leader is responsible for managing the replicated log and ensuring consistency. If a leader fails, a new election is held, and a new leader is chosen for the next term.

15. Consensus (Raft/Paxos)

Both Raft and Paxos are leader election algorithms used in distributed computing to elect a single process as the leader or coordinator of a task distributed among several nodes. The leader is responsible for the organization and coordination of the task, thus making it a fundamental problem in distributed computing.

Paxos is a family of consensus algorithms that guarantee the consistency of replicated state machines in a distributed system. The algorithm uses a proposal and acceptance mechanism to ensure that only one value or operation can be chosen as the final agreed-upon value. Paxos is designed to tolerate up to a third of the nodes failing or behaving maliciously.

Raft, on the other hand, is an alternative consensus algorithm to Paxos. It was designed to be simpler and more understandable than Paxos. Raft works by electing a leader among the nodes, which is responsible for managing the replicated state machine. Unlike Paxos, Raft uses a heartbeat mechanism to allow nodes to detect a failed leader and elect a new one.

Overall, both Paxos and Raft provide fault-tolerance and performance guarantees for consensus-based distributed systems, but Raft is considered to be more understandable and easier to implement than Paxos.

PreviousReverse index, Inverted index, Trie, Rsync, Merkle tree NextTime sync, Erasure coding, Message digest, Atomic commit, Mutual exclusion

Last updated 1 year ago