Partitioning is a database design technique that has numerous benefits, among them the reduction of data read time that ultimately reduces the overall response time. How the data is physically stored is one of several possible choices made by the database administrator that can improve an application’s ability to access data.
“It has long been known that database partitioning is the answer to improving the performance and scalability of relational databases,” states CodeFutures, a software development firm.
What are the partitioning options available to database administrators? There are two primary approaches: horizontal and vertical partitioning.
In horizontal partitioning, the database is segmented into table rows. This forms physical row-based datasets that either can be addressed individually — in a single partition — or else can be addressed from one partition to all partitions.
“All columns defined to a table are found in each set of partitions so no actual table attributes are missing,” writes Robin Schumacher, the director of product management for MySQL. “An example of horizontal partitioning might be a table that contains ten years worth of historical invoice data being partitioned into ten distinct partitions, where each partition contains a single year’s worth of data.”
Vertical partitioning is exactly what it sounds like: The data in a table is split vertically. Specific columns in the database are included in a dataset. Each partition includes all rows. This approach can also be used for so-called BLOB data, which is a term used to describe a collection of binary data stored as a single entity in a database.
“An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren’t addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another,” explains Schumacher.
One partitioning strategy administrators could employ would be to segregate data by time. They might also decide to separate the data based on whether it is being actively used.
Partitioning does not merely mean the data is physically separated, explains Theo Schlossnagle, chief executive officer of OmniTI Computer Consulting. “It is often used to simply split our data up so that more hardware can be leveraged to process it. Google’s information, for example, is partitioned all over the place and then they ask all the system components (servers) to participate in answering questions.”
Which approach should you adopt? As Schlossnagle sagely observes:
“The techniques for choosing on which component to store a particular piece of data are wildly varying, each with its own advantages and disadvantages. Understanding how you will be storing data and more importantly what questions you will be asking over the data set dictate the partitioning scheme that is most appropriate.”
Subscribe to Our Blog
Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.
Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.