Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What is Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 12:46 pm #4922DataFlair TeamSpectator
can you explain in detail what is Apache Spark? What are the features of Spark, what type of big data problems spark can solve.
-
September 20, 2018 at 12:47 pm #4925DataFlair TeamSpectator
Spark is an open source big data framework. It has an expressive APIs to allow big data professionals to efficiently execute streaming as well as the batch. It provides faster and more general data processing platform engine. It is basically designed for fast computation. It was developed at UC Berkeley in 2009. Spark is an Apache project which is also known as “lighting fast cluster computing“. It distributes data in file system across the cluster, and process that data in parallel. It covers a wide range of workloads like batch applications, iterative algorithms, interactive queries and streaming. It lets you write an application in Java, Python or Scala.
It was developed to overcome the limitations of MapReduce cluster computing paradigm. Spark keeps things in memory whereas map reduce keep shuffling things in and out of disk. It allows to cache data in memory which is beneficial in iterative algorithm those used in machine learning.
Spark is easier to develop as it knows how to operate on data. It supports SQL queries, streaming data as well as graph data processing. Spark doesn’t need Hadoop to run, it can run on its own using other storages like Cassandra, S3 from which spark can read and write. In terms of speed spark run programs up to 100x faster in memory or 10x faster on disk than Map Reduce.You can refer What is Apache Spark for deeper insight.
-
-
AuthorPosts
- You must be logged in to reply to this topic.