Why Mapper runs in heavy weight process and not in a thread?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Why Mapper runs in heavy weight process and not in a thread?

Viewing 1 reply thread
  • Author
    Posts
    • #4910
      DataFlair TeamDataFlair Team
      Spectator

      Why Mapper / Map Task is a process and not a thread?
      Why for each task is launched in a new process (heavy weight) rather than a thread?

    • #4911
      DataFlair TeamDataFlair Team
      Spectator

      Each task is launches as a separate process instead of thread because :

      • Mappers are run across Hadoop clusters in distributed manner, in distributed processing environment the task is split and are run in parallel.
        Threads are multiple tasks of a single process which shares the same memory area and the data and usually threads are within the boundary of a single system, but each mapper uses different data for processing(since its distributed).
      • Each of the Mapper task in Hadoop runs as a different JVM process, this is because MapReduce programs are long running processes and they can be killed due to usage of commodity hardware. If Map reduce were implemented as thread, one error in a single mapper could kill the entire process and hence you would have to re run all the process(since as stated above, it’s sub-task under same process).
      • Managing threads is relatively more complex, if a thread execution hangs then it needs to be killed and the task would have to start from where it left.

      Follow the link to learn more about Mappers in Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.