How is Apache Spark better than Hadoop?

Viewing 1 reply thread
  • Author
    • #6399
      DataFlair Team

      What are the cases where Apache Spark surpasses Hadoop?
      What are the benefits of Apache Spark over Apache Hadoop?

    • #6400
      DataFlair Team

      Apache Spark is lightening fast cluster computing tool. It is up to 100 times faster than Hadoop MapReduce due to its very fast in-memory data analytics processing power.
      Apache Spark is a Big Data Framework. Apache Spark is a general purpose data processing engine and is generally used on top of HDFS. Apache Spark is suitable for the variety of data processing requirements ranging from Batch Processing to Data Streaming.

      Hadoop is an open source framework which processes data stored in HDFS. Hadoop can process structured, unstructured or semi-structured data. Hadoop MapReduce can process the data only in Batch mode.

      Apache Spark surpasses Hadoop in many cases such as
      1. Processing the data in memory which is not possible in Hadoop
      2. Processing the data that is in batch, iterative, interactive & streaming i.e. Real Time mode. Whereas Hadoop processes only in batch mode.
      3. Spark is faster because it reduces the number of disk read-write operations due to its virtue of storing intermediate data in memory. Whereas in Hadoop MapReduce intermediate output which is output of Map() is always written on local hard disk
      4. Apache Spark is easy to program as it has hundreds of high-level operators with RDD (Resilient Distributed Dataset)
      5. Apache Spark code is compact due compared to Hadoop MapReduce. Use of Scala makes it very short, reduces programming efforts. Also, Spark provides rich APIs in various languages such as Java, Scala, Python, and R.
      6. Spark & Hadoop are both highly fault-tolerant.
      7. Spark application running in Hadoop clusters is up to 10 times faster on disk than Hadoop MapReduce.

      You can also learn detailed comparison of Apache Spark and Hadoop MapReduce on the basis of various features. check it on:
      Apache Spark vs. Hadoop MapReduce

Viewing 1 reply thread
  • You must be logged in to reply to this topic.