Why slaves limited to 4000 in Hadoop Version 1?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Why slaves limited to 4000 in Hadoop Version 1?

Viewing 1 reply thread
  • Author
    Posts
    • #4706
      DataFlair TeamDataFlair Team
      Spectator

      Maximum no of salves allowed in Hadoop v1 is 4000. Why Hadoop cluster cannot be scaled more than that ? What are the factors restricting the scalability beyond 4000 ?

    • #4707
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop 1, there are two types of daemon that control the job execution process:

      • Jobtracker
      • Tasktrackers

      Jobtracker coordinates all the jobs run on the system by scheduling tasks to run on task trackers.

      Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker.

      In MapReduce 1, the jobtracker takes care of both job scheduling (matching tasks with tasktrackers) and task progress monitoring (keeping track of tasks, restarting failed or slow tasks, and doing task bookkeeping, such as maintaining counter totals). The jobtracker is also responsible for storing job history for completed jobs, although it is possible to run a job history server as a separate daemon to take the load off the jobtracker.

      Why Hadoop cluster cannot be scaled beyond 4000?

      MapReduce 1 hits scalability bottlenecks in the region of 4,000 nodes, and 40,000 tasks6, which stem from the fact that the jobtracker has to manage both jobs and tasks.

      When the cluster size increases beyond 4000, JobTracker behavior is unpredictable, resulting in cascading errors and overall performance deteriorates.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.