Database Sharding and Consistent Hashing

The gist ... with a twist
13 min readNov 12, 2022

--

What is Database sharding? Why do we need it? How do we achieve it? Types of Database sharding. Challenges to sharding. Consistent Hashing. And more…

The problem we are trying to solve

With the advent of the world becoming ‘one large stage’ and the applications set to mimic the same — scaling is and always will be a primary problem to solve and at the forefront of all design thinking for systems being developed for mass scale use. As the applications continue to scale — the persistent storage needs to scale accordingly. This is therefore the best time to introduce the types of database scaling:

Vertical Scaling

When additional resources are added to an existing system — this increase in capabilities is called Vertical scaling. This may involve adding more storage, more CPU etc to enable the existing system to handle greater and more complex queries. So whilst this will help with the availability for some time, it may not be possible to indefinitely increase the reliability of the system in the long run. Over-provisioning of resources is a major drawback of vertical scaling and so are the limitations associated with the amount of scaling that can happen without impacting the performance of an application.

Horizontal Scaling

This is something that can be accomplished in more than one ways.

  1. Using instance replication — This is where all DB queries are separated into read and write operations. Whilst the Primary DB instance is used for the write operations, the read only instances share the load by increasing availability. If the Primary DB goes down, one of the read replicas may be promoted to be a primary instance until a new instance may be introduced back as per the needs of the application. This type of horizontal scaling however compromises on data consistency as there may be a substantial lag between when the primary DB is updated and when all the read replicas reflect the same change as they should. Essentially, in this case, as per the CAP (Consistency, Availability, Partition Tolerance) Theorem, whilst AP shall be well taken care of, the C (consistency) shall take the hit. There are ways to navigate and minimise the lack of consistency but again, that shall have an impact elsewhere on the system’s performance.

2. Adding a cache layer in front of the DB — Caching is a well-known strategy that substantially reduces the latency as caches are inherently fast memories, even though equally expensive. With the most frequent queries cached; the number of hits to the DB can be reduced considerably thereby reducing the load on it. Caches can be read-through or write through based on the system requirements but something out of the scope of our discussion — therefore for another time!

3. And finally, Database sharding …

… an approach we shall discuss in greater detail below.

Database sharding

Sharding is the process of partitioning the data so that the different instances have the different subsets of the same database. There, that was pretty simple! This concept does introduce extra overhead in terms of finding out which data sits where, but is a great technique to reduce the loads on a single server. Sharding, like scaling, can also be horizontal or vertical, as will be explained below.

Horizontal sharding — Horizontal partitioning refers to division of rows of a table into multiple tables, known as partitions. The number of columns and therefore the structure/schema of the database stays the same in all partitions. Horizontal sharding is effective when queries tend to return a subset of rows that are often grouped together. For example, queries that filter data based on short date ranges or ID ranges are ideal for horizontal sharding since that will limit querying to only a subset of the servers. This type of sharding is therefore better suited for OLAP queries, rather than OLTP.

Vertical sharding — Vertical partitioning on the other hand refers to division of columns into multiple tables. It is effective when queries tend to return only a subset of columns of the data. For example, if some queries request only names, and others request only addresses, then the names and addresses can be sharded onto separate servers thereby distributing the load and doing away with the need of effectively separating and then re-joining the data. This also implies that vertical sharding is better suited for OLTP rather than OLAP as it makes it difficult to compare and analyse data in the absence of a complete ‘picture’ about a specific entity.

Considerations for sharding

Replication, reliability and a maintenance strategy are some of the most important considerations when deciding upon a sharding architecture. A system in which the data frequently changes, for example, requires a different architecture than one that mainly handles read requests. Most sharded databases have one of the following four architectures:

  • Range Sharding : Each range directly maps to a different shard. The sharding key should ideally be immutable. If the key changes, the shard must be recalculated and the record copied to the new shard. Otherwise, the mapping is destroyed and the location could be lost. Range sharding is also known as dynamic sharding.
  • Hashed Sharding : Hash-based sharding also uses the shard key to determine which shard a record is assigned to. However, instead of mapping the key directly to a shard, it applies a hash function to the shard key. A hash function transforms one or more data points to a new value that lies within a fixed-size range. This is discussed in much greater detail in this article for us to focus on the concept of Consistent Hashing.
  • Directory-Based Sharding : Directory-based sharding groups related items together on the same shard. Accomplished through the use of a static lookup table which contains a list of mappings between each possible value for the field and its designated shard, it allows only one key to map to one shard and must appear in the lookup table exactly once. However many keys can potentially be mapped to the same shard.
  • Geographic-Based Sharding : Geo-sharding is a specific type of directory-based sharding where data is divided amongst the shards based on the location of the entry, which relates to the location of the server hosting the shard.

Advantages of sharding

  • Increased read/write throughput — By distributing the dataset across multiple shards, both read and write operation capacity is increased as long as read and write operations are confined to a single shard.
  • Increased storage capacity — Similarly, by increasing the number of shards, you can also increase overall total storage capacity, allowing near-infinite scalability.
  • High availability — Finally, shards provide high availability in two ways. First, since each shard is a replica set, every piece of data is replicated. Second, even if an entire shard becomes unavailable since the data is distributed, the database as a whole still remains partially functional, with part of the schema on different shards.

Disadvantages of sharding

  • Query overhead — Each sharded database must have a separate machine or service which understands how to route a querying operation to the appropriate shard. This introduces additional latency on every operation. Furthermore, if the data required for the query is horizontally partitioned across multiple shards, the router must then query each shard and merge the result together. This can make an otherwise simple operation quite expensive and slow down response times.
  • Complexity of administration — With a single unsharded database, only the database server itself requires upkeep and maintenance. With every sharded database, on top of managing the shards themselves, there are additional service nodes to maintain. Plus, in cases where replication is being used, any data updates must be mirrored across each replicated node. Overall, a sharded database is a more complex system which requires more administration.
  • Increased infrastructure costs — Sharding by its nature requires additional machines and compute power over a single database server. While this allows your database to grow beyond the limits of a single machine, each additional shard comes with higher costs. The cost of a distributed database system, especially if it is missing the proper optimisation, can be significant.

Having considered the pros and cons, let’s move forward and discuss implementation.

How Sharding works ?

Now let’s talk about sharding in general and how it works. Clearly from the above examples, with sharding we need to come up with an algorithm that helps us define where to find each of the pieces of information. For example, if we have data D1, D2, D3, D4 and servers S0, S1— we need to have a mapping of which server stores which data. This is accomplished through a suitable hashing function. A hash function is a function that maps one piece of data — typically describing some kind of object, often of arbitrary size — to another piece of data, typically an integer, known as hash code, or simply hash.

One of the ways hashing can be implemented in a distributed system is by taking hash modulo. In this case, the hash function may be defined as server_number = hash(key) mod N where N is the number of servers the data needs to be distributed into.

A point to note here is that the efficiency of this hashing algorithm is dependent on the number of servers remaining constant which is quite an assumption in make when servers may be added or removed all the time. To understand this better, let us add another DB server to our network. This would change N to 3, thereby making the distribution look a bit like below:

With the addition of a new server, the already saved data, D2 has had to be moved to the new server S2, D3 has had to be moved to S0 (previously on S1) and D4 would have to be moved to S1 (which was previously on S0). So it is clear that with the addition of a new server, a solution that was very straight forward to begin with, has caused a lot of chaos in terms of rehashing and moving a lot of data from their original locations.

Now, let’s remove a server again, which would lead us back to the initial set up with 2 servers. This would trigger re-hashing again and would result in mass moving of data one more time. So, basically, each time a server is added or removed — all the data will have to be relocated resulting in bulk movement of data and therefore resulting in massive insufficiencies in the system.

As per Alex Xu in his book ‘System Design Interview’, this approach works well when the size of the cluster is fixed, and the data distribution is even. But when new servers get added to meet new demand, or when existing servers get removed, it triggers a storm of misses and a lot of objects to be moved. For situations where servers constantly come and go ( a more real life scenario), this design is untenable.This brings us to the concept of Consistent Hashing.

Consistent Hashing

As per Wikipedia, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only k/n keys need to be remapped on average where k is the number of keys and n is the number of servers. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

The goal of consistent hashing is simple. We want almost all objects to stay assigned to the same server even as the number of servers changes.

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. So, we could take a key, compute its hash, and find out where it lies on the hash ring’s edge. This allows servers and objects to scale without affecting the overall system.

Using a hash function, we hash each server by its name or IP address, and place the server onto the ring. Next, we hash each object by its key with the same hashing function.

To locate the server for a particular object, we go clockwise from the location of the object key on the ring until a server is found. Continue with our example, key 0 is on server 0, key 1 is on server 1. Since we have the keys for both the objects and the servers on the same circle, we may define a simple rule to associate the former with the latter: Each object key will belong in the server whose key is closest, in a clockwise direction (or counterclockwise, depending on the conventions used). In other words, to find out which server to ask for a given key, we need to locate the key on the circle and move in the descending angle direction until we find a server.

Now let’s add a server to the ring and see how that impacts the re-distribution of keys if it all. We insert a new server s4 to the left of s0 on the ring. Note that only k0 needs to be moved from s0 to s4. This is because s4 is the first server k0 encounters by going clockwise from k0’s position on the ring. Keys k1, k2, and k3 are not affected as shown in the diagram below:

Next let’s remove a server from the ring and see how that impacts the re-distribution of keys if it all. We remove the server s3 on the ring. Note that only k3 needs to be moved to s4. This is because s4 is the first server k3 encounters by going clockwise from k3's position on the ring. Keys k1, k2, and k0 are not affected as shown in the diagram below:

From a programming perspective, what we would do is keep a sorted list of server values, and walk this list to find the first server with a value greater than, or equal to, that of the desired key. If no such value is found, we need to wrap around, taking the first one from the list.

Virtual Nodes

One problem of the basic setup above is that the loads of the servers are not evenly distributed. To ensure object keys are evenly or more appropriately distributed among servers, we need to apply a simple trick: To assign not one, but many labels (angles, or hash functions) to each server. So instead of having nodes S0, S1, S2 and S3, we could have, say, S0_0 .. S0_9, S1_0 .. S1_9, S2_0 .. S2_9 and S3_0 .. S3_9, all interspersed along the circle. The factor by which to increase the number of labels, virtual servers(server keys), known as weight, depends on the situation (and may even be different for each server) to adjust the probability of keys ending up on each. For example, if server S1 were twice as powerful as the rest, it could be assigned twice as many labels, and as a result, it would end up holding twice as many objects (on average). To accomplish the above, we hash each server using multiple hash functions to get multiple placements on the ring.

In the above diagram, as we can see — we have multiple virtual nodes added for different servers based on their capacity; three for S3 implies there can be more keys mapped to server s3 than to other nodes in the network. Now if a server say S1 was to go down, it would result in all virtual nodes s1_x to disappear, thereby implying only the key k0 will need to be remapped to s0, via the s0_0 virtual node.

This is how consistent hashing solves the rehashing problem.

In general, only k/N keys need to be remapped when k is the number of keys and N is the number of servers (more specifically, the maximum of the initial and final number of servers).

Conclusion

We observed that when using distributed caching to optimise performance, it may happen that the number of caching servers changes (reasons for this may be a server crashing, or the need to add or remove a server to increase or decrease overall capacity). By using consistent hashing to distribute keys between the servers, we can rest assured that should that happen, the number of keys being rehashed — and therefore, the impact on origin servers — will be minimised, preventing potential downtime or performance issues.

There are clients for several systems, such as Memcached, Redis, Cassandra, HBase, HDFS, MongoDB and Redis that include support for consistent hashing out of the box. Sqlite, Memcached, Zookeeper, MySQL and PostgreSQL are databases that don’t natively support sharding at the database layer but can do so via a middleware.

References:

--

--

The gist ... with a twist
The gist ... with a twist

Written by The gist ... with a twist

The gist of some of the books I have read or the knowledge I have acquired. If it saves you time - go ahead; read them and share them!

No responses yet