Apache Hadoop: What’s the Big Deal?

By | In Big Data, Blog | October 11th, 2013

Apache Hadoop

There’s a lot of buzz surrounding Apache Hadoop, an open-source software project for the distributed processing of Big Data. Hadoop and Big Data are one and the same since the global market for Hadoop-MapReduce software now has a compound annual growth rate of 60.2%, with sales expected to increase from $77 million in 2011 to $812.8 million by 2016, according to IDC analysts. Vendors including IBM and Oracle offer support for tools and services in the Hadoop ecosystem.

Hadoop’s best features and potential applications

Those extolling Hadoop’s use point to Hadoop’s attributes, including its scalability, cost efficacy, flexibility, and fault tolerance. Hadoop allows users to add nodes without needing to make significant changes, such as altering the data format or the applications running atop it.

Hadoop can be run on commodity servers, making it an affordable solution for smaller enterprises. Because Hadoop does not rely on any specific data type, it can work with both structured and unstructured data from multiple sources. Users can join and aggregate the data in various ways.

Hadoop is also fault-tolerant and automatically redirects work so processing can continue if a node is lost. As IBM explains:

“It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.”

Although Hadoop is an analytics tool, most enterprise users are reportedly deploying it for storage and Extract, Transform, Load (ETL) tasks rather than for analytics.

As Cade Metz at Wired observes:

“Hadoop reinvented data analysis not only at Facebook and Yahoo but so many other web services. And then an army of commercial software vendors started selling the thing to the rest of the world. Soon, even the likes of Oracle and Greenplum were hawking Hadoop. These companies still treated Hadoop as an adjunct to the traditional database — as a tool suited only to certain types of data analysis. But now, that’s changing too.”

It is changing, but slowly, according to Matt Asay, vice president of corporate strategy at 10gen, which created MongoDB. He says:

“We’re still early in Hadoop’s technological and market evolution, in part due to the complexity of the technology, with 26% of even the most sophisticated Hadoop users citing how long it takes to get into production as a gating factor to its widespread use. Gartner reveals even lower rates of adoption of Big Data projects, often involving Hadoop, at a mere 6%, as enterprises try to grapple with both appropriate use cases and understanding the relevant technology. […] We’re still getting comfortable with Hadoop.”

Image by Intel Free Press.

Contact Us
Eric Russo
Senior Vice President of Database Services
Eric Russo is SVP of Database Services overseeing all of Datavail’s database practices including project and managed services for MS SQL, Oracle, Oracle EBS, MySQL, MongoDB, SharePoint and DB2. He is also the Product Owner for Datavail Delta, a database monitoring tool. He has 21 years’ experience in technology including 16 years in database management. His management success and style has attracted top DBAs from around the world to create one of the most talented and largest SQL Server teams. He has been with Datavail since 2008: previous to that his work experiences include DBA Manager at StrataVia, Senior Web Developer at Manifest Information Systems and SQL Server DBA at Clark County, Nevada.

Leave a Reply

Your email address will not be published.
Required fields are marked (*).