Why Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6433
      DataFlair Team
      Moderator

      What is the need of Apache Spark?

    • #6434
      DataFlair Team
      Moderator

      Basically, we had so many general purpose cluster computing tools. For example Hadoop MapReduce, Apache Storm, Apache Impala, Apache Storm, Apache Giraph and many more. But each one has some limitations in their functionality as well. Such as:

      1. Hadoop MapReduce can only allow for batch processing.
      2. If we talk about stream processing only Apache Storm / S4 can perform.
      3. Again for interactive processing, we need Apache Impala / Apache Tez.
      4. While we need to perform graph processing, we opt for Neo4j / Apache Giraph.

      Therefore, No single engine can perform all the tasks together. hence there was a big demand for a powerful engine that can process the data in real-time (streaming) as well as in batch mode
      Also, which can respond to sub-second and perform in-memory processing
      .

      In this way, Apache Spark comes in picture. It is a powerful open-source engine that offers interactive processing, real-time stream processing, graph processing, in-memory processing as well as batch processing. Even with very fast speed, ease of use and also standard interface at the same time.

      There are many more insights of Spark. To learn all, follow the link: Apache Spark – A Complete Spark Tutorial for Beginners

Viewing 1 reply thread
  • You must be logged in to reply to this topic.