Data analysis is not a new phenomenon. It has been around since the beginning, changing and maturing with time. In our information age the advances in computing power have converted data analytics into a specialized science based on algorithms, software tools, and database technology. Enterprises today use business and data analytics to glean information. This information could be about business trends, sales performance, lead sources — by region or seasonal demand — gender profile of customers, customer satisfaction, and more.
This information is used to do the following:
- Make business decisions based on patterns formed from existing data
- Predict future trends by forecasting results from existing data
- Use statistical analysis to determine why certain events happened or are currently taking place
The veracity of the results we derive from data depends upon the veracity of the raw data. In other words, incorrect or, simply put, bad data can have a domino effect. It can result in unreliable insights, bad decision-making, and loss of business.
What exactly is bad data?
There are numerous characteristics that can result in data being classified as bad and unreliable. Let’s take a look at a few of these:
- Duplicated or triplicated data
- Missing or incorrect information in fields
- Incorrect data types
- Incorrect data formats – dates stored in European date format and interpreted as U.S. format, for example
- Emails with invalid formats or invalid emails
What causes bad data?
Bad data may be an endemic and yet unrealized issue in an application. This could be due to a bad database design or programming errors that go undetected before application releases.
Data is often consolidated and exported from tables within a database using ETL tools to intermediate formats such as CSV files or other formats. This is often done as a precursor to uploading to a data warehouse for further business analytics. Errors may take place at this stage due to reasons such as improper mapping or field definition, therefore converting good data into bad.
Apart from the above error, data could be incorrect just because of incorrect entries at the clerical level. This type of issue may happen randomly and is much more difficult to avoid.
What are the consequences?
At the OLTP level bad data can result in such consequences as the invoices going to the wrong customer, vendors not getting paid on time, and, in extreme cases, consumers being charged astronomical sums for products or services.
When it comes to making decisions or predictive analysis, bad data can result in wrong and potentially costly decisions and even lawsuits. OLAP-based systems can yield seriously wrong conclusions if the underlying data is bad. One very interesting example relates to data collection, assimilation, and interpretation practices, and how the faulty practices result in multiple issues for a hospital.
How do we get it right?
Ensuring data integrity and quality is essential for:
- Implementing operational efficiencies
- Meeting statutory and regulatory requirements
- Making sound business decisions to maximize revenue
Here are a few steps that can ensure data quality:
- Define metadata and set up definitions of data
- Carry out a data audit
- Validate data at the stage where data is generated and when it is ported to another format
- Automate processes (e.g., for porting data) to the maximum extent possible
- Test applications thoroughly
- Enlist the services of companies focused on data quality
Managerial decision-making relies not on the rule of thumb or presumption, but on the hardcore facts and insights. Such facts and insights are provided to management by a rigorous and disciplined data analysis process. But data analysis can provide correct results only if the underlying data is reliable, relevant and consistent.
Data services companies can provide a lot of value when it comes to data quality. Companies like Datavail can provide overnight and holiday oversight of your databases so your DBAs are on top of their game when performing data consolidations, exports, and migrations. We can also run database assessments and help your team implement best practices and processes to ensure database management is consistent and free of errors. Our expertise can ensure your data has a solid baseline of infrastructure and support that can positively effect data integrity.
In brief, bad data equals bad decisions. Data quality assurance is the key to preventing corruption of good data or generation of bad data in the first place. The need is for organizations to shift from a reactive stance of scrubbing or doctoring data to a proactive stance of promoting best practices in the area of data stewardship and governance either internally or with the support of outside services.
According to the Harvard Business Review, it is estimated that the combined losses to the U.S. on account of poor quality data will be an astonishing $3 trillion in 2016. With figures like that It is time we started to feel the ground we are walking on by making sure that the data we are using is reliable.
To learn more about best practices for data quality assurance please contact Datavail today. With more than 600 database administrators worldwide, Datavail is the largest database services provider in North America. As a reliable provider of 24×7 managed services for applications, BI/Analytics, and databases, Datavail can support your organization, regardless of the build you’ve selected.
EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.
Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.