There’s a lot of buzz surrounding Apache Hadoop, an open-source software project for the distributed processing of Big Data. Hadoop and Big Data are one and the same since the global market for Hadoop-MapReduce software now has a compound annual growth rate of 60.2%, with sales expected to increase from $77 million in 2011 to $812.8 million by 2016, according to IDC analysts. Vendors including IBM and Oracle offer support for tools and services in the Hadoop ecosystem.
Hadoop’s best features and potential applications
Those extolling Hadoop’s use point to Hadoop’s attributes, including its scalability, cost efficacy, flexibility, and fault tolerance. Hadoop allows users to add nodes without needing to make significant changes, such as altering the data format or the applications running atop it.
Hadoop can be run on commodity servers, making it an affordable solution for smaller enterprises. Because Hadoop does not rely on any specific data type, it can work with both structured and unstructured data from multiple sources. Users can join and aggregate the data in various ways.
Hadoop is also fault-tolerant and automatically redirects work so processing can continue if a node is lost. As IBM explains:
“It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.”
Although Hadoop is an analytics tool, most enterprise users are reportedly deploying it for storage and Extract, Transform, Load (ETL) tasks rather than for analytics.
As Cade Metz at Wired observes:
“Hadoop reinvented data analysis not only at Facebook and Yahoo but so many other web services. And then an army of commercial software vendors started selling the thing to the rest of the world. Soon, even the likes of Oracle and Greenplum were hawking Hadoop. These companies still treated Hadoop as an adjunct to the traditional database — as a tool suited only to certain types of data analysis. But now, that’s changing too.”
It is changing, but slowly, according to Matt Asay, vice president of corporate strategy at 10gen, which created MongoDB. He says:
“We’re still early in Hadoop’s technological and market evolution, in part due to the complexity of the technology, with 26% of even the most sophisticated Hadoop users citing how long it takes to get into production as a gating factor to its widespread use. Gartner reveals even lower rates of adoption of Big Data projects, often involving Hadoop, at a mere 6%, as enterprises try to grapple with both appropriate use cases and understanding the relevant technology. […] We’re still getting comfortable with Hadoop.”
Image by Intel Free Press.
Subscribe to Our Blog
Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.
The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.