Explain Spark Architecture in brief.

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 2:46 pm #5283
  
  DataFlair Team
  Spectator
  
  Explain Spark Architecture in brief.
- September 20, 2018 at 2:46 pm #5285
  DataFlair Team
  Spectator
  - In real world, ApacheSpark operates in master / slave fashion with one central co-ordinator and many distributed worker.
  - Central co-ordinator is called ‘Driver’ while distributed worker called ‘executor’.
  - Driver communicates with large no. of executor.
  - Driver program runs in its Java Process while each executor runs in its own Java Process.
  - Driver and executors together known as ‘Spark Application’.
  - Spark application is launched on cluster using Cluster Manager.
  - Spark has its in-built cluster manager called Standalone Cluster Manager.
  - However, one can run spark on two popular open source cluster manager known as Hadoop YARN and Apache Mesos.
  Spark Driver –> Cluster Manager (Standalone, YARN, Mesos)–> Worker (executor)
  In reality, there are many workers below Cluster Manager but for simplicity just shown one executor.
  
  For detailed description of Apache Spark Ecosystem refer to Components of Apache Spark Ecosystem
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.