Features and characteristics of Apache Spark

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:09 pm #4755
  
  DataFlair Team
  Spectator
  
  What are the features and characteristics of Apache Spark which make it superior than other Big Data solutions like Hadoop-MapReduce?
- September 20, 2018 at 12:10 pm #4762
  DataFlair Team
  Spectator
  Apache Spark is the Next-Gen Big Data tool (considered as future of Big Data and successor of Hadoop MapReduce), below are the features of Spark:
  1. Speed: Speed always matters for processing data, organizations want to process voluminous data as fast as possible. Spark is Lightning fast processing tool makes it speedier to handle complex processing. As it follows the concept of RDD (Resilient Distributed Dataset) which allows it to store data transparently in memory, which helps in reducing read & write to disc one of the main time-consuming factor.
  2. Usability: Ability to support multiple languages makes it dynamic. It allows you to quickly write an application in Java, Scala, Python, and R.
  3. In-Memory Computing: Keeping data in servers’ RAM as it makes accessing stored data quickly. In memory, analytics accelerates iterative machine learning algorithms as it saves data read and write round trip from/to disk.
  4. Pillar to Sophisticated Analytics: Spark comes with tools for interactive/declarative queries, streaming data, machine learning which is an addition to the simple map and reduces, so that users can combine all this into the single workflow.
  5. Real-Time Stream Processing: Spark streaming can handle real-time stream processing along with the integration of other frameworks which concludes that spark’s streaming ability is easy, fault tolerance and Integrated.
  6. Compatibility with Hadoop & existing Hadoop Data: Spark is compatible with both versions of the Hadoop ecosystem. Be it YARN (Yet Another Resource Negotiator) or SIMR (Spark in MapReduce). It can read anything existing Hadoop data that’s what makes it suitable for migration of pure Hadoop-MapReduce applications. It can run independently too.
  7. Lazy Evaluation: Another outstanding feature of Spark which is called by need or memorization. It waits for instructions before providing the final result which saves significant time.
  8. Active, progressive and expanding community: Built by the wide set of developers from over 100 companies. It has active mailing state and JIRA for issue tracking. It is the most active component in Apache repository.
  For details about Spark, follow: Apache Spark Introductory Guide
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Features and characteristics of Apache Spark

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses