Break Your Database: The Merits of Database Sharding

By | In Blog, Database Administration | October 27th, 2013

Database sharding is a data-partitioning scheme spreading data across various servers in a distributed fashion.

Also known as “shared-nothing” database partitioning; the concept was developed by Google and has gained popularity among enterprises. Among those other organizations adopting sharding include Amazon, Skype, YouTube, Facebook, and Wikipedia.

Sharding, simply put, divides a database into parts, or shards. Each of these shards can be hosted on a different server. The main benefit to using this technique is a boost in performance, which is the result of the technology using a distributed approach to database storage and access. Some IT professionals refer to it as horizontal scaling.

As Amazon explains in a patent recently issued for its approach:

“In relational database management systems, data is organized into tables containing rows and columns. Each row corresponds to an instance of a data item, and each column corresponds to an attribute for the data item. Sharding produces partitions by rows instead of columns. Through partitioning, the data in a single table may be spread among potentially many different physical data stores, thereby improving scalability.”

The task of the database professional or programmer implementing such a strategy is to establish rules that explicitly state those machines on which a piece of data will be stored.

If you have a database that is in four shards, for example, as db Shards, a blog by software development firm CodeFutures explains, each of these shards could be four different and separate MySQL instances. Each of these shards is hosted on its own server. As an example, each shard could have a limit of 1,000 connections with 800 concurrent transactions. Because the queries are distributed, each server will, on average, be able to process four times the number of concurrent requests.

Theo Schlossnagle, president and chief executive officer for OmniTI Computer Consulting, says the approach isn’t new. He defines sharding as:

“Sharding is the act of creating shards. Somehow, somewhere somebody decided that what they were doing was so cool that they had to make up a new term for what people have been doing for many, many years. It is partitioning… [S]ometimes that partitioning is proper federation. You don’t need a cool name to effectively accomplish what’s been around for a long time. More so, you don’t need a name that implies you broke something irreparably.”

Whatever you choose to call the approach, this shared-nothing approach to database partitioning may have appreciable benefits worth investigating for your organization.

Image by Paul Hammond; spinner image by H.Adam.

Contact Us
Eric Russo
Senior Vice President of Database Services
Eric Russo is SVP of Database Services overseeing all of Datavail’s database practices including project and managed services for MS SQL, Oracle, Oracle EBS, MySQL, MongoDB, SharePoint and DB2. He is also the Product Owner for Datavail Delta, a database monitoring tool. He has 21 years’ experience in technology including 16 years in database management. His management success and style has attracted top DBAs from around the world to create one of the most talented and largest SQL Server teams. He has been with Datavail since 2008: previous to that his work experiences include DBA Manager at StrataVia, Senior Web Developer at Manifest Information Systems and SQL Server DBA at Clark County, Nevada.

Leave a Reply

Your email address will not be published.
Required fields are marked (*).