Break Your Database: The Merits of Database Sharding
Author: Eric Russo | 3 min read | October 27, 2013
Database sharding is a data-partitioning scheme spreading data across various servers in a distributed fashion.
Also known as “shared-nothing” database partitioning; the concept was developed by Google and has gained popularity among enterprises. Among those other organizations adopting sharding include Amazon, Skype, YouTube, Facebook, and Wikipedia.
Sharding, simply put, divides a database into parts, or shards. Each of these shards can be hosted on a different server. The main benefit to using this technique is a boost in performance, which is the result of the technology using a distributed approach to database storage and access. Some IT professionals refer to it as horizontal scaling.
As Amazon explains in a patent recently issued for its approach:
“In relational database management systems, data is organized into tables containing rows and columns. Each row corresponds to an instance of a data item, and each column corresponds to an attribute for the data item. Sharding produces partitions by rows instead of columns. Through partitioning, the data in a single table may be spread among potentially many different physical data stores, thereby improving scalability.”
The task of the database professional or programmer implementing such a strategy is to establish rules that explicitly state those machines on which a piece of data will be stored.
If you have a database that is in four shards, for example, as db Shards, a blog by software development firm CodeFutures explains, each of these shards could be four different and separate MySQL instances. Each of these shards is hosted on its own server. As an example, each shard could have a limit of 1,000 connections with 800 concurrent transactions. Because the queries are distributed, each server will, on average, be able to process four times the number of concurrent requests.
Theo Schlossnagle, president and chief executive officer for OmniTI Computer Consulting, says the approach isn’t new. He defines sharding as:
“Sharding is the act of creating shards. Somehow, somewhere somebody decided that what they were doing was so cool that they had to make up a new term for what people have been doing for many, many years. It is partitioning… [S]ometimes that partitioning is proper federation. You don’t need a cool name to effectively accomplish what’s been around for a long time. More so, you don’t need a name that implies you broke something irreparably.”
Whatever you choose to call the approach, this shared-nothing approach to database partitioning may have appreciable benefits worth investigating for your organization.
Image by Paul Hammond; spinner image by H.Adam.