Why Mapper runs in heavy weight process and not in a thread?

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:43 pm #4910
  
  DataFlair Team
  Spectator
  
  Why Mapper / Map Task is a process and not a thread?
  Why for each task is launched in a new process (heavy weight) rather than a thread?
- September 20, 2018 at 12:43 pm #4911
  DataFlair Team
  Spectator
  Each task is launches as a separate process instead of thread because :
  - Mappers are run across Hadoop clusters in distributed manner, in distributed processing environment the task is split and are run in parallel.
    Threads are multiple tasks of a single process which shares the same memory area and the data and usually threads are within the boundary of a single system, but each mapper uses different data for processing(since its distributed).
  - Each of the Mapper task in Hadoop runs as a different JVM process, this is because MapReduce programs are long running processes and they can be killed due to usage of commodity hardware. If Map reduce were implemented as thread, one error in a single mapper could kill the entire process and hence you would have to re run all the process(since as stated above, it’s sub-task under same process).
  - Managing threads is relatively more complex, if a thread execution hangs then it needs to be killed and the task would have to start from where it left.
  Follow the link to learn more about Mappers in Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.