Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Apache Spark › What is worker node in Apache Spark cluster?
September 20, 2018 at 4:22 pm #5832
What are the roles and responsibilities of worker nodes in the apache spark cluster? Is Worker Node in Spark is same as Slave Node?
September 20, 2018 at 4:22 pm #5833
Worker node refers to node which runs the application code in the cluster. Worker Node is the Slave Node. Master node assign work and worker node actually perform the assigned tasks. Worker node processes the data stored on the node, they report the resources to the master. Based on the resource availability Master schedule tasks.
September 20, 2018 at 4:22 pm #5834
Apache Spark follows a master/slave architecture, with one master or driver process and more than one slave or worker processes
1. The master is the driver that runs the main() program where the spark context is created. It then interacts with the cluster manager to schedule the job execution and perform the tasks.
2. The worker consists of processes that can run in parallel to perform the tasks scheduled by the driver program. These processes are called executors.
Whenever a client runs the application code, the driver programs instantiates Spark Context, converts the transformations and actions into logical DAG of execution. This logical DAG is then converted into a physical execution plan, which is then broken down into smaller physical execution units. The driver then interacts with the cluster manager to negotiate the resources required to perform the tasks of the application code. The cluster manager then interacts with each of the worker nodes to understand the number of executors running in each of them.
The role of worker nodes/executors:
1. Perform the data processing for the application code
2. Read from and write the data to the external sources
3. Store the computation results in memory, or disk.
The executors run throughout the lifetime of the Spark application. This is a static allocation of executors. The user can also decide how many numbers of executors are required to run the tasks, depending on the workload. This is a dynamic allocation of executors.
Before the execution of tasks, the executors are registered with the driver program through the cluster manager, so that the driver knows how many numbers of executors are running to perform the scheduled tasks. The executors then start executing the tasks scheduled by the worker nodes through the cluster manager.
Whenever any of the worker nodes fail, the tasks that are required to be performed will be automatically allocated to any other worker nodes
For information on how Spark works Spark-How it works
- You must be logged in to reply to this topic.