Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Why slaves limited to 4000 in Hadoop Version 1?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 11:51 am #4706DataFlair TeamSpectator
Maximum no of salves allowed in Hadoop v1 is 4000. Why Hadoop cluster cannot be scaled more than that ? What are the factors restricting the scalability beyond 4000 ?
-
September 20, 2018 at 11:51 am #4707DataFlair TeamSpectator
In Hadoop 1, there are two types of daemon that control the job execution process:
- Jobtracker
- Tasktrackers
Jobtracker coordinates all the jobs run on the system by scheduling tasks to run on task trackers.
Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker.
In MapReduce 1, the jobtracker takes care of both job scheduling (matching tasks with tasktrackers) and task progress monitoring (keeping track of tasks, restarting failed or slow tasks, and doing task bookkeeping, such as maintaining counter totals). The jobtracker is also responsible for storing job history for completed jobs, although it is possible to run a job history server as a separate daemon to take the load off the jobtracker.
Why Hadoop cluster cannot be scaled beyond 4000?
MapReduce 1 hits scalability bottlenecks in the region of 4,000 nodes, and 40,000 tasks6, which stem from the fact that the jobtracker has to manage both jobs and tasks.
When the cluster size increases beyond 4000, JobTracker behavior is unpredictable, resulting in cascading errors and overall performance deteriorates.
-
-
AuthorPosts
- You must be logged in to reply to this topic.