Apache Mesos vs Hadoop Yarn Comparison

Hadoop Quiz

1. Objective

In this YARN vs Mesos comparison tutorial, we will learn the difference between Apache Mesos vs Hadoop YARN to understand which technology is better in between YARN and Mesos and how does YARN compare to Mesos? We will also see which cluster type to use for Spark on YARN vs Mesos?

Apache Mesos vs Hadoop Yarn Comparison

Apache Mesos vs Hadoop Yarn Comparison

2. Comparison between Apache Mesos vs Hadoop YARN

Before starting with the difference between YARN and Mesos, let us revise our Apache Mesos concepts and Apache YARN concepts.
Let us now start learning the difference between Apache Mesos and Hadoop Yarn.

a. Language Used

Apache Mesos: C++ is used for the development because it is good for time sensitive work

Hadoop YARN:  YARN is written in Java.

b. Scheduler

Apache Mesos: When a job comes into execution, the job request comes into Mesos master and Mesos determines the resources that are available and sends the request to the framework. This allows the framework to determine what is the best fit for a job that’s needed to be run. Thus, it is non-monolithic scheduler (it is two way process entity, that makes scheduling decision and deploy job to the scheduler).

Hadoop YARN: When job request comes into the Yarn resource manager, it evaluates all the resources available and places the job accordingly. Thus it is a monolithic scheduler (Monolithic schedulers are a single process entity, that make scheduling decisions and deploy jobs to be scheduled.

c. Scheduling

Apache Mesos:  In Mesos, it is a memory and CPU scheduling, i.e. push based scheduling.

Hadoop YARN: In YARN, it is mainly memory scheduling, i.e. pull based scheduling.

d. Scalability

Apache Mesos: Due to non-monolithic scheduler, Mesos is highly scalable.

Hadoop YARN: It is less scalable because it is a monolithic scheduler.

e. Handling data center

Apache Mesos: If we want to manage data center as a whole, Apache Mesos can manage every single resource in the data center.

Hadoop YARN: It can safely manage the Hadoop job but it is not capable of managing the entire data center.

f. Abstraction

Apache Mesos: Here we get Low-level abstraction.

Hadoop YARN: Here we can run YARN on Mesos (Myriad).

g. Availability

Apache Mesos: In Mesos, high availability is achieved through multiple Mesos masters, if one master runs down; the master with the highest priority comes into action.

Hadoop YARN: Here YARN Resource Manager supports high availability.

h. Fault tolerance

Apache Mesos: It provides fault tolerance at each step. At master level, to make master fault tolerant, Zookeeper monitors all the nodes in the master cluster and if the hot master node fails, it elects the new Master. In order to make framework fault tolerant, two or more schedulers are registered with the master. In case if one scheduler fails, the master will notify another scheduler. If the slave process fails, the task continues running and when the master restarts the slave process because it is not responding to messages, the restarted slave process will use the check pointed data to recover state and to reconnect with executors/tasks.

Hadoop YARN: If a YARN resource manager fails, it recovers from its own failure by restoring its state from a persistent store on initialization; it kills all the containers running in the cluster after the recovery process is complete. While when a node manager fails, the resource manager detects it by timing out its heartbeat response, marks all the containers running on that node as killed, and reports the failure to all running Application Master. If the fault is transient, the YARN node manager will re-synchronize with the resource manager, clean up its local state, and continue.

i. Security

Apache Mesos: Here, only trusted entities are authenticated to interact with the Mesos cluster. By default, the authentication is disabled. When authentication is enabled, operator configures Mesos to either use the default authentication module or to use custom authentication module.

Hadoop YARN: While for the security of Hadoop YARN, we talk of a various layer of defense: Authentication, authorization, audits. Authentication, it can be in two forms from user to service e.g. HTTP authentication or from service to service. Authorization, Apache Hadoop provides Unix-like file permission and has access control list for YARN. Audit, Apache Hadoop has audit logs for NameNodes that record file creation and opening. There are history logs for JobTracker, JobHistoryServer, and ResourceManager.

j. Container requirement

Apache Mesos: When Framework asks a container, it gets to choose a resource. Thus, very minimal information is just needed.

Hadoop YARN: Here each time the Framework asks a container with specification and preferences, so lots of information is required to be passed.



Hadoop YARN

Leave a Reply

Your email address will not be published. Required fields are marked *