The data your organization has may certainly be “big” compared to a decade ago, but is it Big Data?
IBM estimates the world creates 2.5 quintillion bytes of data per day. In light of this, many people and organizations have attempted to define Big Data, resulting in a wide, sometimes disparate, range of definitions.
Advertising Age notes it was “the amount of information, and the speed at which it can be created, collected and analyzed” that was responsible for transforming routine data into Big Data. Cloud platforms and open-source applications, as well as more affordable hardware, helped push Big Data to the fore by providing affordable resources to those smaller, typically resource-constrained enterprises.
The term “Big Data” is commonly used to describe the vast volumes of data produced on a daily basis by an enterprise. The most frequently cited definition of Big Data was created by Gartner analysts, which states:
“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
O’Reilly’s Edd Dumbill contends the following three V’s are “a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them.” He says most people working with corporate databases will encounter each of these “to one degree or another.”
Parsing the definition can help organizations — especially those decision-makers or executives unfamiliar with the jargon — better communicate regarding the assets they have and the tools they need.
Big Data are those data streams so large that they exceed the processing capacity of conventional database systems. IBM says volume is a given as “Enterprises are awash with ever-growing data of all types, easily amassing terabytes — even petabytes — of information.”
These streams of data can be used to determine what consumers think of a product or to predict customer consumption patterns. Sensor data in particular, whether gathered for scientific discovery or to detect traffic patterns, inundates organizations with torrents of data, often in real time.
The velocity aspect of Big Data is essential to many organizations needing to use data in real time. Some enterprises might need to assess ongoing sales to determine fraudulent transactions. Others may need to analyze customer behavior to deliver a credible suggestion that becomes a sale. Gaining mastery of data velocity helps organizations gain a competitive advantage.
Big Data consists of a wide variety of data types — sensor data, social media site posts, various types of digital media files, sales transaction, and many other bits and bytes collected by organizations in real time. Not all of this is structured data, some of it is unstructured data in text format.
IBM adds veracity to its definition, estimating that “1 in 3 business leaders don’t trust the information they use to make decisions.” It adds, “How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.”
As this Big Data is collected, the next step for organizations is identifying, even creating, the tools and processes needed to analyze Big Data. This could include determining whether open-source tools such as Apache Hadoop work in your enterprise. Or, if you’re already using Amazon Web Services or other cloud-based services, you may need a specific suite of tools to work effectively with that data.
Then, that data analysis can be used by your organization to develop a wide range of actionable business intelligence or insights. These should be based on specific business problems. The goal could include broad projects such as identifying trends to narrower needs, such as discovering specific new revenue streams.
Subscribe to Our Blog
Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.
Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.