Features and characteristics of Apache Spark

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Spark Features and characteristics of Apache Spark

Viewing 1 reply thread
  • Author
    Posts
    • #4755
      DataFlair TeamDataFlair Team
      Spectator

      What are the features and characteristics of Apache Spark which make it superior than other Big Data solutions like Hadoop-MapReduce?

    • #4762
      DataFlair TeamDataFlair Team
      Spectator

      Apache Spark is the Next-Gen Big Data tool (considered as future of Big Data and successor of Hadoop MapReduce), below are the features of Spark:

      1. Speed: Speed always matters for processing data, organizations want to process voluminous data as fast as possible. Spark is Lightning fast processing tool makes it speedier to handle complex processing. As it follows the concept of RDD (Resilient Distributed Dataset) which allows it to store data transparently in memory, which helps in reducing read & write to disc one of the main time-consuming factor.
      2. Usability: Ability to support multiple languages makes it dynamic. It allows you to quickly write an application in Java, Scala, Python, and R.
      3. In-Memory Computing: Keeping data in servers’ RAM as it makes accessing stored data quickly. In memory, analytics accelerates iterative machine learning algorithms as it saves data read and write round trip from/to disk.
      4. Pillar to Sophisticated Analytics: Spark comes with tools for interactive/declarative queries, streaming data, machine learning which is an addition to the simple map and reduces, so that users can combine all this into the single workflow.
      5. Real-Time Stream Processing: Spark streaming can handle real-time stream processing along with the integration of other frameworks which concludes that spark’s streaming ability is easy, fault tolerance and Integrated.
      6. Compatibility with Hadoop & existing Hadoop Data: Spark is compatible with both versions of the Hadoop ecosystem. Be it YARN (Yet Another Resource Negotiator) or SIMR (Spark in MapReduce). It can read anything existing Hadoop data that’s what makes it suitable for migration of pure Hadoop-MapReduce applications. It can run independently too.
      7. Lazy Evaluation: Another outstanding feature of Spark which is called by need or memorization. It waits for instructions before providing the final result which saves significant time.
      8. Active, progressive and expanding community: Built by the wide set of developers from over 100 companies. It has active mailing state and JIRA for issue tracking. It is the most active component in Apache repository.

      For details about Spark, follow: Apache Spark Introductory Guide

Viewing 1 reply thread
  • You must be logged in to reply to this topic.