Hadoop Yarn is an open-source resource management platform that provides an efficient way to run distributed applications on clusters of computers. It is the de facto resource manager for Hadoop and its related ecosystem of tools. The core of Yarn is a resource manager which allocates resources to each application and tracks the progress of any applications running on the cluster. A client interface allows applications to submit jobs or query their respective statuses. From a user’s viewpoint, Yarn simplifies job submission, scheduling, and tracking across the Hadoop cluster.
The term “yarn” is derived from the concept of spinning a single thread of yarn into multiple threads. In the same way, Yarn spun multiple threads of execution into a single cluster in order to manage various distributed applications more efficiently. Yarn is composed of three key components: resource manager, node managers, and application masters.
The resource manager manages the allocation of virtual machines, or containers, across the Hadoop cluster in order to execute jobs. The node managers are responsible for monitoring and managing the execution of applications on the nodes in the cluster. Finally, the application masters are responsible for negotiating resources from the resource manager and then managing the execution of applications.
By using Yarn, cluster administrators can allocate resources across the Hadoop cluster more efficiently. Yarn allows for multi-tenancy and isolation of applications, eliminating any potential for clashes between different applications running in the cluster.
Yarn has become an integral part of the Hadoop ecosystem, making it easier to submit, schedule and track applications running on the Hadoop cluster. This has dramatically increased the scalability and usability of Hadoop, while allowing users to take full advantage of the computing power of Hadoop.