new Jvm instead of a new java thread?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 11:23 am #4622
  
  DataFlair Team
  Spectator
  
  Why new mapper or reducer task is started in separate JVM ? Why does Hadoop launches new jvm instead of a new thread when it launches a new task (either a mapper or reducer)? Why Mapper or Reducer is launched as heavy weight process rather then light weight thread ?
- September 20, 2018 at 11:23 am #4623
  
  DataFlair Team
  Spectator
  
  Firstly, Map reduce algorithm is built for the distributed processing systems.Threads are the subprocesses and inside the process, they do not lie outside the boundary of OS i.e. a single machine.Thus on each new machine, a process is launched instead of thread.
  Finally, the thread shares data, variables, and resources while map reduce works on the different chunks of data. These data are distributed over the cluster and makes it difficult to initiate threading over the nodes of different datasets.
- September 20, 2018 at 11:23 am #4624
  
  DataFlair Team
  Spectator
  
  MapReduce framework uses one mapper for one block and the data in the block is processes sequentially line by line. The native framework doesn’t provide a run() method for running mapper in a multi-threaded environment. It’s hard work overriding the native run() method and creates a multi-threaded mapper (but it can be done). However, the management, control and heartbeat reporting will be more complex.
- September 20, 2018 at 11:24 am #4625
  
  DataFlair Team
  Spectator
  
  The MapReduce framework provides failsafe execution. To allow this, it is better to launch each task in its own JVM rather than in a thread. When a task fails, the same task can be started in its own environment. This provides a better management, control, and reporting of each task. Managing threads is relatively more complex. If a thread hangs then it needs to be killed and the task would have to start from where it left. The creation of new JVM avoids all of this overhead.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

new Jvm instead of a new java thread?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses