What is Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #5677
      DataFlair TeamDataFlair Team
      Spectator

      What is Apache Spark?
      Why Spark is booming in the industry?
      In all the Job Descriptions Spark is having very high weightage.

    • #5680
      DataFlair TeamDataFlair Team
      Spectator

      Apache Spark is a powerful open source flexible data processing framework built around speed, ease of use, and sophisticated analytics.Apache Spark is lightening fast in cluster computing system. Spark can run on Hadoop, standalone or in the cloud and is capable of accessing data from various sources including HDFSHBase, Cassandra or others.

      Because of in-cluster computing in Spark, it doesn’t require to keep shuffling things in and out of disk. This results in faster processing of data in spark.

      Spark has several advantages compared to other big data and MapReduce technologies like Hadoop and Storm. Few of them are:
      1.Speed
      It can run program up to 100 times faster than Hadoop-MapReduce in memory, or 10 times faster on disk.
      2.Ease of Use
      Spark has easy-to-use APIs for operating on large data sets. This includes a collection of over 100 operators for
      transforming data and familiar data frame APIs for manipulating semi-structured data.
      We can write applications in Java, Scala, Python, R.
      3.A Unified Engine
      Spark comes with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing.
      4.Runs Everywhere
      Spark can run on top of Hadoop, Mesos, standalone, or in the cloud.

      Spark ecosystem

      Below is the brief overview of Spark Ecosystem and its components.
      It consists of:
      Spark Streaming: Spark Streaming is used for processing the real-time streaming data.
      Spark SQL: Spark SQL component is a library on top of Spark cluster, by using we can run SQL queries on Spark data.
      Spark MLlib: MLlib is Spark’s scalable machine learning library.
      Spark GraphX: GraphX is for graphs and graph-parallel computation.

      For more on Apache Spark click:
      Apache Spark Introduction

Viewing 1 reply thread
  • You must be logged in to reply to this topic.