What Is Big Data?

Author: Patrick Gates | 4 min read | October 3, 2013

The data your organization has may certainly be “big” compared to a decade ago, but is it Big Data?

IBM estimates the world creates 2.5 quintillion bytes of data per day. In light of this, many people and organizations have attempted to define Big Data, resulting in a wide, sometimes disparate, range of definitions.

Advertising Age notes it was “the amount of information, and the speed at which it can be created, collected and analyzed” that was responsible for transforming routine data into Big Data. Cloud platforms and open-source applications, as well as more affordable hardware, helped push Big Data to the fore by providing affordable resources to those smaller, typically resource-constrained enterprises.

The term “Big Data” is commonly used to describe the vast volumes of data produced on a daily basis by an enterprise. The most frequently cited definition of Big Data was created by Gartner analysts, which states:

“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”

O’Reilly’s Edd Dumbill contends the following three V’s are “a helpful lens through which to view and understand the nature of the data and the software platforms available to exploit them.” He says most people working with corporate databases will encounter each of these “to one degree or another.”

Parsing the definition can help organizations — especially those decision-makers or executives unfamiliar with the jargon — better communicate regarding the assets they have and the tools they need.

Volume

Big Data are those data streams so large that they exceed the processing capacity of conventional database systems. IBM says volume is a given as “Enterprises are awash with ever-growing data of all types, easily amassing terabytes — even petabytes — of information.”

These streams of data can be used to determine what consumers think of a product or to predict customer consumption patterns. Sensor data in particular, whether gathered for scientific discovery or to detect traffic patterns, inundates organizations with torrents of data, often in real time.

Velocity

The velocity aspect of Big Data is essential to many organizations needing to use data in real time. Some enterprises might need to assess ongoing sales to determine fraudulent transactions. Others may need to analyze customer behavior to deliver a credible suggestion that becomes a sale. Gaining mastery of data velocity helps organizations gain a competitive advantage.

Variety

Big Data consists of a wide variety of data types — sensor data, social media site posts, various types of digital media files, sales transaction, and many other bits and bytes collected by organizations in real time. Not all of this is structured data, some of it is unstructured data in text format.

IBM adds veracity to its definition, estimating that “1 in 3 business leaders don’t trust the information they use to make decisions.” It adds, “How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.”

As this Big Data is collected, the next step for organizations is identifying, even creating, the tools and processes needed to analyze Big Data. This could include determining whether open-source tools such as Apache Hadoop work in your enterprise. Or, if you’re already using Amazon Web Services or other cloud-based services, you may need a specific suite of tools to work effectively with that data.

Then, that data analysis can be used by your organization to develop a wide range of actionable business intelligence or insights. These should be based on specific business problems. The goal could include broad projects such as identifying trends to narrower needs, such as discovering specific new revenue streams.

Blog Author

Patrick Gates

Vice President and Practice Leader of Oracle Services, Datavail

Patrick’s background includes 15 years of IT experience specializing in database architecture, database administration and performance tuning. He has managed the infrastructure for enterprise database operations of over 300 databases, including several ranging from 10 gigabytes to 80 terabytes. Patrick has designed and developed comprehensive database administration solutions for high performance, reliability and integrity, including backup and recovery, fault-tolerant connectivity, operations and performance monitoring, reporting, automated storage management, BCDR, SOX compliance and Co-Sourcing. A former manager at Level 3 Communications, Patrick has valuable experience in database architecture and corporate data warehousing. Patrick’s hobbies include skiing, Crossfit, hockey and playing with his kids.

What Is Big Data?

Volume

Velocity

Variety

Blog Author

Subscribe to Our Blog