Walking on Thin Ice: The Fallout from Bad Data

Author: Eric Russo | 5 min read | December 20, 2016

Data analysis is not a new phenomenon. It has been around since the beginning, changing and maturing with time. In our information age the advances in computing power have converted data analytics into a specialized science based on algorithms, software tools, and database technology. Enterprises today use business and data analytics to glean information. This information could be about business trends, sales performance, lead sources — by region or seasonal demand — gender profile of customers, customer satisfaction, and more.

This information is used to do the following:

Make business decisions based on patterns formed from existing data
Predict future trends by forecasting results from existing data
Use statistical analysis to determine why certain events happened or are currently taking place

The veracity of the results we derive from data depends upon the veracity of the raw data. In other words, incorrect or, simply put, bad data can have a domino effect. It can result in unreliable insights, bad decision-making, and loss of business.

What exactly is bad data?

There are numerous characteristics that can result in data being classified as bad and unreliable. Let’s take a look at a few of these:

Duplicated or triplicated data
Missing or incorrect information in fields
Incorrect data types
Incorrect data formats – dates stored in European date format and interpreted as U.S. format, for example
Emails with invalid formats or invalid emails

What causes bad data?

Bad data may be an endemic and yet unrealized issue in an application. This could be due to a bad database design or programming errors that go undetected before application releases.

Data is often consolidated and exported from tables within a database using ETL tools to intermediate formats such as CSV files or other formats. This is often done as a precursor to uploading to a data warehouse for further business analytics. Errors may take place at this stage due to reasons such as improper mapping or field definition, therefore converting good data into bad.

Apart from the above error, data could be incorrect just because of incorrect entries at the clerical level. This type of issue may happen randomly and is much more difficult to avoid.

What are the consequences?

At the OLTP level bad data can result in such consequences as the invoices going to the wrong customer, vendors not getting paid on time, and, in extreme cases, consumers being charged astronomical sums for products or services.

When it comes to making decisions or predictive analysis, bad data can result in wrong and potentially costly decisions and even lawsuits. OLAP-based systems can yield seriously wrong conclusions if the underlying data is bad. One very interesting example relates to data collection, assimilation, and interpretation practices, and how the faulty practices result in multiple issues for a hospital.

How do we get it right?

Ensuring data integrity and quality is essential for:

Implementing operational efficiencies
Meeting statutory and regulatory requirements
Making sound business decisions to maximize revenue

Here are a few steps that can ensure data quality:

Define metadata and set up definitions of data
Carry out a data audit
Validate data at the stage where data is generated and when it is ported to another format
Automate processes (e.g., for porting data) to the maximum extent possible
Test applications thoroughly
Enlist the services of companies focused on data quality

Managerial decision-making relies not on the rule of thumb or presumption, but on the hardcore facts and insights. Such facts and insights are provided to management by a rigorous and disciplined data analysis process. But data analysis can provide correct results only if the underlying data is reliable, relevant and consistent.

Data services companies can provide a lot of value when it comes to data quality. Companies like Datavail can provide overnight and holiday oversight of your databases so your DBAs are on top of their game when performing data consolidations, exports, and migrations. We can also run database assessments and help your team implement best practices and processes to ensure database management is consistent and free of errors. Our expertise can ensure your data has a solid baseline of infrastructure and support that can positively effect data integrity.

In brief, bad data equals bad decisions. Data quality assurance is the key to preventing corruption of good data or generation of bad data in the first place. The need is for organizations to shift from a reactive stance of scrubbing or doctoring data to a proactive stance of promoting best practices in the area of data stewardship and governance either internally or with the support of outside services.

According to the Harvard Business Review, it is estimated that the combined losses to the U.S. on account of poor quality data will be an astonishing $3 trillion in 2016. With figures like that It is time we started to feel the ground we are walking on by making sure that the data we are using is reliable.

To learn more about best practices for data quality assurance please contact Datavail today. With more than 600 database administrators worldwide, Datavail is the largest database services provider in North America. As a reliable provider of 24×7 managed services for applications, BI/Analytics, and databases, Datavail can support your organization, regardless of the build you’ve selected.

Blog Author

Eric Russo

Senior Vice President of Database Services

Eric Russo is Senior Vice President of Database Services overseeing all of Datavail’s database practices including project and managed services for SQL Server, Oracle, MySQL, MongoDB, Db2, PostgreSQL, Cassandra and AWS. He is also the product owner for Datavail TechBoost™ (formerly Datavail Delta), a cloud-based automation platform. He has more than 20 years’ experience in IT with a majority of those years in database management. His management success and style has attracted top DBAs from around the world to create one of the most talented and largest SQL Server team. He has been with Datavail since 2008. Previous to that his work experiences include DBA Manager at StrataVia, Senior Web Developer at Manifest Information Systems and SQL Server DBA at Clark County, Nevada.