VARCHAR versus Compression
Author: Craig Mullins | | October 27, 2015
Earlier on this blog I posted a blurb giving advice on using variable character columns in DB2. After thinking about the topic a little bit more, I decided to post a follow-on topic: namely, comparing the use of VARCHAR to DB2 compression.
Even though these are two entirely different “things,” they are each probably done for similar reasons – to save disk storage. VARCHAR does this by adjusting the size of the column to fit the actual length of text being stored; compression does this by sending rows of data through an algorithm to minimize its length. For those interested in the details of compression I refer you to Willie Favero’s excellent blog where he has written a several-part series on compression — here are the links to it:part one, part two, and part three.
So, what advice can I give on comparing the two? Well, you might want to consider forgoing the use of variable columns and instead turn on compression. With variable columns you always add overhead: there is a two-byte prefix for every VARCHAR column to store the length of the VARCHAR. If instead you use CHAR and turn on compression you no longer need the extra two bytes per row per variable column.
Also, keep in mind that compression reduces the size of the entire row. So not only will you be compressing the CHAR column (that used to be VARCHAR), but you will also give DB2 the opportunity to compress every other column in that row.
All in all, that means that comrpession can return better disk storage savings than variable columns, and all without the programmatic overhead of having to calculate and store the two-byte prefix for each previously variable column.
Of course, I don’t want to give the impression that this should always be done… (remember the DBA’s maxim: Almost never say “always or never.”) And there are additional things to consider, such as:
- Compression adds a compression dictionary to the table space so a compressed table space can actually be larger than a non-compressed table space (if it is very small to begin with).
- Compression requires additional CPU cycles to compress and de-compress the data as it is inserted, modified, and read (of course, I/O can decrease because smaller rows will fit more on each page, so degraded CPU performance can be offset by improved I/O)
This is just an additional “thing to consider” when you are building your DB2 databases and trying to decide whether you should use VARCHAR or CHAR…
This post was originally posted on Craig Mullins’ blog at http://db2portal.blogspot.com/2006/08/varchar-versus-compression.html
Related Posts
How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified
The “ORA-12154: TNS Oracle error message is very common for database administrators. Learn how to diagnose & resolve this common issue here today.
Data Types: The Importance of Choosing the Correct Data Type
Most DBAs have struggled with the pros and cons of choosing one data type over another. This blog post discusses different situations.
How to Recover a Table from an Oracle 12c RMAN Backup
Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.