Learn Apache Spark from Scratch Learn Apache Spark with Real-time Projects

Getting Started with Spark

Install Spark on your machine now and get started with Spark today.

Exploring the Framework

Let’s take a look at some facts about Spark and its philosophies.

Spark first showed up at UC Berkeley’s AMPLab in 2014. In 2010, it was open-sourced under a BSD license. Then in 2013, Zaharia donated the project to the Apache Software Foundation under an Apache 2.0 license. By February of 2014, it was a top-level Apache project. Today, Spark is an open-source distributed general-purpose cluster-computing framework; the Apache Software Foundation maintains it. It gives us an interface for programming whole clusters implementing implicit data parallelism and fault tolerance.

Essentially, Apache Spark is a unified analytics engine for large-scale data processing.

Apache Spark Founder Matei Zaharia

Matei Zaharia

What makes Spark so popular?

The project lists the following benefits:

1. Speed- Spark runs workloads 100x faster. Making use of a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine, it establishes optimal performance for both batch and streaming data.
2. Ease of Use- Spark lets you quickly write applications in languages as Java, Scala, Python, R, and SQL. With over 80 high-level operators, it is easy to build parallel apps. These can be availed interactively from the Scala, Python, R, and SQL shells.
3. Generality- Spark combines SQL, streaming, and complex analytics. With a stack of libraries like SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming, it is also possible to combine these into one application.
4. Runs Everywhere- Spark runs on Hadoop, Apache Mesos, or on Kubernetes. It will also run standalone or in the cloud, and can access diverse data sources.