As data gathering becomes an around-the-clock necessity for competitive organizations, the question of data retention arises.
How long is too long to keep bits and bytes of potentially obsolete information around? When the period for legally retaining data has passed, should you purge your data?
The answer is an emphatic, “yes!” To ensure your databases continue functioning effectively and successfully, the periodic purging of infrequently accessed data is a must. Without it, database reporting takes longer and systems dependent on that data, such as business intelligence systems, also crawl to a halt.
Once data is no longer needed for specific legal or business requirements, it should be purged. As my colleague Chuck Ezell explained to Abe Selig of Plotting Success, “It’s one of the first things I look for. … What’s your data retention policy, and do you need 10 years’ worth of data in your tables?”
Data retention policies vary by organization based on the type of business they are conducting. Ezell explains:
Publicly traded companies are definitely going to have a five-, 10-, maybe even a 15-year retention policy, particularly for data such as accounting records that they may need to show SOX [Sarbanes-Oxley Act] compliance.
When data is not purged regularly, the tables in a database can grow so large that performance begins to deteriorate.
What Data Is Necessary?
There are several strategies that can be employed to remove data. Ezell says the first data to be purged is any created without human interaction, such as Web server logs or call detail records.
Most organizations need to keep data accessible for roughly 24 months. After that period, some older data may need to be periodically archived. Any nonessential data can be a candidate for purging.
Others advocate creating a procedure or framework for data purging. This might include using a tool or script to effectively automate the process. Some database management systems include command-driven utilities that are simple to use.
The data purging process also necessitates establishing some sort of definitions or schedules for various types of data. This applies not only to data currently being used, but also dormant information, such as older transactions.
Although data purging may be a simple process, changes can affect parts of the remaining system, such as tables. Retaining pertinent data is important to ensure these continue to function properly. This is particularly true for relational database systems that may rely on some pieces of data across tables.
Benefits Beyond Speed
Purging, if properly completed on a routine basis, can accelerate database reporting. Ezell notes:
A recent client of mine was experiencing reporting slowdowns and needed to purge data and reorganize his tables. … After we did that — and it took 36 hours of work, as his tables were huge and took large amounts of time to reorganize — he benefited from a 20-30 percent speed increase, simply from purging old data out of his system.
What’s the ultimate benefit beyond performance? Sanjay Joshi, writing on SQL Server Pro explains, “companies can … more precisely estimate future storage and server needs. Thus, ROI will improve for existing hardware, maintenance, and future hardware requisitions.”
Image by exopixel/123RF.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.