The demand for access to Big Data grows day by day, as it provides significant promises of competitive advantage for a business. Big data is often comprised of sensitive customer details, an organization’s confidential information, and additional information from myriad sources. This data is often unstructured and dirty and needs to be transformed, refined, and secured in order to yield the expected results (often times we call this data wrangling). This is where data governance comes in.
What is data governance?
Data governance is the set of policies and their aligned processes through which an organization collects, stores, secures, and uses information owned by the business. Data governance puts in place a framework to ensure that data is used consistently and consciously within the organization. In addition, there are controls created and maintained around data changes as well. So the governance includes not just the use of the data but also the data management. A good data governance policy should consist of a set of procedures, an execution plan, and a governing body to ensure the integrity of the entire process for both today and tomorrow.
What is the relationship between Big Data and data governance?
Although both Big Data and data governance have been around for quite some time, the relationship between them is still a novelty. The need for data governance in big data rises from the need of businesses to obtain accurate, reliable, and actionable insight into their existing data. In the absence of a proper data governance policy, data projects can result in excessive costs as well as misleading insights, which would eventually cause irrevocable damage to the business.
According to Malcolm Chisholm, President of AskGet, Inc., a data management consultancy firm in Holmdel, New Jersey, everything about data governance is fairly new in the realm of big data. As a result, many data managers aren’t quite sure how to approach it.
According to Chisholm:
“To get meaningful business information from big data, all sorts of things need to be done, like semantic analysis of the data, which is then rendered into conceptual models or ontologies… And all that involves a heap of governance stuff.”
Most involved in data governance only go as far as enforcing data privacy and other data policy rules, which is simply not enough as far as big data is concerned. Big data governance needs to put effort toward creating trust and integrity in data, making it usable and visible, and ultimately ensuring that it gives the expected benefits back to the business.
So how do you govern big data?
According to Dan Sholler, director of Product Marketing for data governance company Collibra, big data governance should kick off by defining an overall policy for data governance. This would include clear policies defined for data inventory, data ownership, data quality, information security, and data retention. Having policies in place for each of these allows the business to identify the right stakeholders and to enforce procedures to ensure efficiency and reliability, while reducing the risks of failure.
Next, it is required to define the standards for the data, starting from the critical data sets that derive most value to the business. This includes identifying critical data elements, defining metadata and relationships, and, finally, documenting them for future reference. There should be an automation process in place to handle large data loads coming from different sources of the organization as manual processes can’t cater to such a large breadth of information.
Today, analytical models are critical to a big data platform, hence they go hand in hand with a data governance policy. The analytical model should be simple enough to be understood and utilized by both business users and data analysts. In addition, there should be a process in place to cater to the new analytical requirements of end users. As we see the advent of newer tools like SlamData, analytical models become less critical, however the governance need is not diminished.
Finally, and most importantly, there should be a dedicated team in place to handle the responsibilities of big data governance. The team should consist of a sponsor, chief data officer, and several subject matter experts to constantly monitor, review, revise, and enhance the quality of the process and the information.
In short, Big Data can’t be expected to give its true value to a business without an established data governance policy in place. Data governance is quite a novelty as far as big data is concerned, however, it is increasingly getting the attention of big data teams. In no time, it will become a must-have for any big data project, allowing businesses to fully leverage its power.
If you need help establishing a data governance policy for your company, or to learn more, contact Datavail today. With more than 600 database administrators worldwide, Datavail is the largest database services provider in North America. With 24×7 managed services for applications, BI/Analytics, and databases, Datavail can support your organization, regardless of the build you’ve selected.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.
Learn how to fix common Log Shipping Failure errors in SQL Server. Includes step-by-step instructions, screenshots, and software script.