Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) › Forums › Apache Spark › How is Apache Spark better than Hadoop?
September 20, 2018 at 9:43 pm #6399DataFlair TeamModerator
What are the cases where Apache Spark surpasses Hadoop?
What are the benefits of Apache Spark over Apache Hadoop?
September 20, 2018 at 9:43 pm #6400DataFlair TeamModerator
Apache Spark is lightening fast cluster computing tool. It is up to 100 times faster than Hadoop MapReduce due to its very fast in-memory data analytics processing power.
Apache Spark is a Big Data Framework. Apache Spark is a general purpose data processing engine and is generally used on top of HDFS. Apache Spark is suitable for the variety of data processing requirements ranging from Batch Processing to Data Streaming.
Apache Spark surpasses Hadoop in many cases such as
1. Processing the data in memory which is not possible in Hadoop
2. Processing the data that is in batch, iterative, interactive & streaming i.e. Real Time mode. Whereas Hadoop processes only in batch mode.
3. Spark is faster because it reduces the number of disk read-write operations due to its virtue of storing intermediate data in memory. Whereas in Hadoop MapReduce intermediate output which is output of Map() is always written on local hard disk
4. Apache Spark is easy to program as it has hundreds of high-level operators with RDD (Resilient Distributed Dataset)
5. Apache Spark code is compact due compared to Hadoop MapReduce. Use of Scala makes it very short, reduces programming efforts. Also, Spark provides rich APIs in various languages such as Java, Scala, Python, and R.
6. Spark & Hadoop are both highly fault-tolerant.
7. Spark application running in Hadoop clusters is up to 10 times faster on disk than Hadoop MapReduce.
You can also learn detailed comparison of Apache Spark and Hadoop MapReduce on the basis of various features. check it on:
Apache Spark vs. Hadoop MapReduce
- You must be logged in to reply to this topic.