Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › new Jvm instead of a new java thread?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 11:23 am #4622DataFlair TeamSpectator
Why new mapper or reducer task is started in separate JVM ? Why does Hadoop launches new jvm instead of a new thread when it launches a new task (either a mapper or reducer)? Why Mapper or Reducer is launched as heavy weight process rather then light weight thread ?
-
September 20, 2018 at 11:23 am #4623DataFlair TeamSpectator
Firstly, Map reduce algorithm is built for the distributed processing systems.Threads are the subprocesses and inside the process, they do not lie outside the boundary of OS i.e. a single machine.Thus on each new machine, a process is launched instead of thread.
Finally, the thread shares data, variables, and resources while map reduce works on the different chunks of data. These data are distributed over the cluster and makes it difficult to initiate threading over the nodes of different datasets. -
September 20, 2018 at 11:23 am #4624DataFlair TeamSpectator
MapReduce framework uses one mapper for one block and the data in the block is processes sequentially line by line. The native framework doesn’t provide a run() method for running mapper in a multi-threaded environment. It’s hard work overriding the native run() method and creates a multi-threaded mapper (but it can be done). However, the management, control and heartbeat reporting will be more complex.
-
September 20, 2018 at 11:24 am #4625DataFlair TeamSpectator
The MapReduce framework provides failsafe execution. To allow this, it is better to launch each task in its own JVM rather than in a thread. When a task fails, the same task can be started in its own environment. This provides a better management, control, and reporting of each task. Managing threads is relatively more complex. If a thread hangs then it needs to be killed and the task would have to start from where it left. The creation of new JVM avoids all of this overhead.
-
-
AuthorPosts
- You must be logged in to reply to this topic.