What are the cases where Apache Spark surpasses Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark What are the cases where Apache Spark surpasses Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5891
      DataFlair TeamDataFlair Team
      Spectator

      How is Apache Spark better than Hadoop?
      What are the benefits of Apache Spark over Apache Hadoop?

    • #5892
      DataFlair TeamDataFlair Team
      Spectator

      We can compare the Hadoop and Spark on below area:

      • Storage
      • Computation
      • Computation Speed
      • Resource

      Compared to Hadoop, main advantage of Spark is its computation speed.Spark has:

      • Lightning-fast cluster computing.
      • Apache Spark is a fast and general engine for large-scale data processing.

      This advantage is because of RDD which is the basic abstraction of spark.
      Apart from this, Spark architecture and Spark execution engine are the two reason that Apache Spark is faster compared to Hadoop.

      Hadoop is ideal for batch processing while Spark can do batch processing as well as iterative, interactive, streaming processing.

      For difference between Apache Hadoop MapReduce and Apache Spark refer ApacheSpark vs. Apache Hadoop MapReduce

    • #5894
      DataFlair TeamDataFlair Team
      Spectator

      The below points are the benefits of Apache Spark over Apache Hadoop

      1.Speed

      Apache Spark is lightning fast cluster computing tool.
      Map reduce reads and writes from disk and that slows down the processing speed.

      2.Difficulty

      It is easy to program in Spark as it contains many high-level operators with RDD – Resilient Distributed Dataset.
      In MapReduce, developers need to hand code each and every operation which makes it very complicated.

      3.Easy to Manage

      Spark performs Batch, Interactive, Machine Learning and Streaming in the same cluster.
      but MapReduce only provides provision for batch processing.

      4.Latency

      Spark provides low latency computing
      Map Reduce is a high latency computing framework

      5.Interactive mode

      Spark can process data interactively
      MapReduce doesn’t have interactive mode

      6.Streaming

      Spark can process real time data through Spark streaming.
      With MapReduce, you can only process data in batch mode.

      7.Ease of use

      Spark is easier to use, its abstraction (RDD) enables the user to process data using high-level operators. It provides rich APIs in Java, Scala, Python, and R.
      Map Reduce is complex; we need to handle low-level APIs to process the data, which requires lots of hand coding.

      For comparison between Apache Hadoop MapReduce and Apache Spark refer ApacheSpark vs. Apache Hadoop MapReduce

Viewing 2 reply threads
  • You must be logged in to reply to this topic.