Why Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 10:02 pm #6433
  
  DataFlair Team
  Spectator
  
  What is the need of Apache Spark?
- September 20, 2018 at 10:02 pm #6434
  
  DataFlair Team
  Spectator
  
  Basically, we had so many general purpose cluster computing tools. For example Hadoop MapReduce, Apache Storm, Apache Impala, Apache Storm, Apache Giraph and many more. But each one has some limitations in their functionality as well. Such as:
  
  1. Hadoop MapReduce can only allow for batch processing.
  2. If we talk about stream processing only Apache Storm / S4 can perform.
  3. Again for interactive processing, we need Apache Impala / Apache Tez.
  4. While we need to perform graph processing, we opt for Neo4j / Apache Giraph.
  
  Therefore, No single engine can perform all the tasks together. hence there was a big demand for a powerful engine that can process the data in real-time (streaming) as well as in batch mode
  Also, which can respond to sub-second and perform in-memory processing
  .
  
  In this way, Apache Spark comes in picture. It is a powerful open-source engine that offers interactive processing, real-time stream processing, graph processing, in-memory processing as well as batch processing. Even with very fast speed, ease of use and also standard interface at the same time.
  
  There are many more insights of Spark. To learn all, follow the link: Apache Spark – A Complete Spark Tutorial for Beginners
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.