Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop Why Mapper runs in heavy weight process and not in a thread?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #4910

    dfbdteam3
    Moderator

    Why Mapper / Map Task is a process and not a thread?
    Why for each task is launched in a new process (heavy weight) rather than a thread?

    #4911

    dfbdteam3
    Moderator

    Each task is launches as a separate process instead of thread because :

    • Mappers are run across Hadoop clusters in distributed manner, in distributed processing environment the task is split and are run in parallel.
      Threads are multiple tasks of a single process which shares the same memory area and the data and usually threads are within the boundary of a single system, but each mapper uses different data for processing(since its distributed).
    • Each of the Mapper task in Hadoop runs as a different JVM process, this is because MapReduce programs are long running processes and they can be killed due to usage of commodity hardware. If Map reduce were implemented as thread, one error in a single mapper could kill the entire process and hence you would have to re run all the process(since as stated above, it’s sub-task under same process).
    • Managing threads is relatively more complex, if a thread execution hangs then it needs to be killed and the task would have to start from where it left.

    Follow the link to learn more about Mappers in Hadoop

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.