Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › What is Apache Spark?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 8 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:55 pm #5677DataFlair TeamSpectator
What is Apache Spark?
Why Spark is booming in the industry?
In all the Job Descriptions Spark is having very high weightage. -
September 20, 2018 at 3:56 pm #5680DataFlair TeamSpectator
Apache Spark is a powerful open source flexible data processing framework built around speed, ease of use, and sophisticated analytics.Apache Spark is lightening fast in cluster computing system. Spark can run on Hadoop, standalone or in the cloud and is capable of accessing data from various sources including HDFS, HBase, Cassandra or others.
Because of in-cluster computing in Spark, it doesn’t require to keep shuffling things in and out of disk. This results in faster processing of data in spark.
Spark has several advantages compared to other big data and MapReduce technologies like Hadoop and Storm. Few of them are:
1.Speed
It can run program up to 100 times faster than Hadoop-MapReduce in memory, or 10 times faster on disk.
2.Ease of Use
Spark has easy-to-use APIs for operating on large data sets. This includes a collection of over 100 operators for
transforming data and familiar data frame APIs for manipulating semi-structured data.
We can write applications in Java, Scala, Python, R.
3.A Unified Engine
Spark comes with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing.
4.Runs Everywhere
Spark can run on top of Hadoop, Mesos, standalone, or in the cloud.Below is the brief overview of Spark Ecosystem and its components.
It consists of:
Spark Streaming: Spark Streaming is used for processing the real-time streaming data.
Spark SQL: Spark SQL component is a library on top of Spark cluster, by using we can run SQL queries on Spark data.
Spark MLlib: MLlib is Spark’s scalable machine learning library.
Spark GraphX: GraphX is for graphs and graph-parallel computation.For more on Apache Spark click:
Apache Spark Introduction
-
-
AuthorPosts
- You must be logged in to reply to this topic.