Select Page

Untangling YARN – What Is It?

Author: Eric Russo | | June 2, 2014

Apache Hadoop released its version 2.2.0, which now includes Apache YARN. It is acknowledged as one of the greatest changes within this latest update, but what is YARN?

YARN, or MapReduce 2.0, opens up Hadoop beyond MapReduce. Because it now separates resource management from the processing components of Hadoop, YARN enables users to interact in more varied and useful ways with their data.

YARN provides cluster resource management and allows applications and services to run natively in Hadoop. In the application stack, for example, YARN sits atop the Hadoop distributed file system, as do Tez — the execution engine for interactive SQL queries — Storm, Giraph, and HBase.

MapReduce previously sent jobs one-by-one to the Hadoop distributed file system (HDFS). Then, it extracted useful information from the data. Now, multiple search tools can be used simultaneously to search data within the HDFS storage system. Multiple applications can be run in Hadoop with YARN.

It also, for example, separates the two primary responsibilities that were in the MapReduce JobTracker component — resource management and job scheduling/monitoring — into separate applications. This allows users to better manage the cluster resources within Hadoop than they could previously.

Another way to think of it is that YARN packages the resource management capabilities that were in MapReduce such that new engines can use them.

Rohit Bakhshi, product manager at Hortonworks, told InfoQ:

By turning Apache Hadoop 2.0 into a multi-application data system, YARN enables the Hadoop community to address a generation of new requirements *in* Hadoop. YARN responds to these enterprise challenges by addressing the actual requirements at a foundational level rather than being commercial bolt-ons that complicate the environment for customers.

YARN is but a larger part of the Hadoop ecosystem. InfoWorld explains:

YARN is a foundational component of the evolving big data mosaic. YARN puts traditional Hadoop into a larger context of composable, fit-to-purpose platforms for processing the full gamut of data management, analytics, and transactional computing jobs. … YARN transforms Hadoop (however defined) into a general-purpose, distributed job-execution layer of the sort that the open source initiative’s original definition (still on the Apache website) alludes to. Though it retains backward compatibility with the MapReduce API and continues to execute MapReduce jobs, a YARN engine is capable of executing a wide range of jobs developed in other languages.

Several organizations are now building applications on YARN, according to Hortonworks.

Bakhshi added:

Hadoop is used in a variety of ways and because it is open source, we see all types of usage. Many organizations will start with just a small cluster comprised of just a few nodes and several terabytes, but eventually these environments grow and grow and grow until they result in a data lake and provide a modern data architecture. Small clusters are not ‘pre-mature’ – they are seeds.

This latest iteration of Hadoop was in development for about four years. Among the organizations reportedly using Hadoop include Amazon Web Services, AOL, Apple, eBay, Facebook, Netflix, and Hewlett-Packard.

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.

Vijay Muthu | February 4, 2021

Using Nulls in DB2

If a column “value” can be null, it can mean one of two things: the attribute is not applicable for certain occurrences of the entity, or the attribute applies to all entity occurrences, but the information may not always be known.

Craig Mullins | April 6, 2015

MongoDB Best Practices: Design, Deployment & More

This post provides a rundown of best practices to use when running MongoDB.

Esayas Aloto | February 28, 2017

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.


Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.