Getting Started with Spark
Install Spark on your machine now and get started with Spark today.
Apache Spark Concepts
- Spark Tutorial
- Introduction to Apache Spark
- What is Apache Spark
- What is Spark
- Install Spark On Ubuntu
- Spark Installation in Standalone Mode
- Install Apache Spark on Multi-Node Cluster
- Apache Spark Terminologies
- Apache Spark Ecosystem
- How Apache Spark Works
- Reasons To Learn Apache Spark
- Features of Apache Spark
- Spark Shell Commands
- SparkContext
- Spark Notes
- Apache Spark and Scala Books
- Apache Spark Careers and Job Opportunities
Intermediate
- Spark In-Memory Computing
- Lazy Evaluation in Apache Spark
- Fault tolerance in Apache Spark
- DAG in Apache Spark
- Apache Spark Cluster Managers
- Apache Spark Compatibility with Hadoop
- Spark Performance Tuning
- Apache Spark Executor
- Spark Stage
- Limitations of Apache Spark
- Data Type Mapping Between R and Spark
- Apache Mesos Tutorial
- Apache Mesos Books
- Spark Dataset Tutorial
- Apache Spark Use Cases
- Big Data Use Cases – Hadoop, Spark and Flink Case Studies
- Apache Spark Certifications
Apache Spark Advanced Concepts
- Spark Streaming Tutorial
- Apache Spark DStream (Discretized Streams)
- Apache Spark Streaming Transformation Operations
- Spark Streaming Checkpoint
- GraphX API in Apache Spark
- Spark GraphX Features
- Spark Machine Learning with R
- Apache Spark MLlib Algorithms
- Spark MLlib Data Types
- Apache Spark Machine Learning Algorithm
Apache Spark Project

Wipe the slate clean and learn PySpark from scratch
Introduction to PySpark
Pros and Cons of PySpark
PySpark SparkFiles & their Class Methods
PySpark RDD with Operations & Commands
PySpark Career Opportunities
Best PySpark Books

Level up to more exciting and challenging chapters
PySpark SparkConf- Attributes & Applications
PySpark SparkContext and its Parameters
PySpark MLlib- Algorithms & Parameters
PySpark Profiler- Methods & Functions
PySpark Serializers- Marshal & Pickle

Master new skills and evolve as an expert
PySpark StorageLevel
PySpark StatusTracker(jtracker)
PySpark Broadcast & Accumulator
PySpark Interview Questions
Exploring the Framework
Let’s take a look at some facts about Spark and its philosophies.
Spark first showed up at UC Berkeley’s AMPLab in 2014. In 2010, it was open-sourced under a BSD license. Then in 2013, Zaharia donated the project to the Apache Software Foundation under an Apache 2.0 license. By February of 2014, it was a top-level Apache project. Today, Spark is an open-source distributed general-purpose cluster-computing framework; the Apache Software Foundation maintains it. It gives us an interface for programming whole clusters implementing implicit data parallelism and fault tolerance.
Essentially, Apache Spark is a unified analytics engine for large-scale data processing.

Matei Zaharia
What makes Spark so popular?
The project lists the following benefits:
1. Speed- Spark runs workloads 100x faster. It achieves this by minimizing disk read/write operations for intermediate results. It stores in memory and performs disk operations only when essential.
2. Ease of Use- Spark lets you quickly write applications in languages as Java, Scala, Python, R, and SQL. With over 80 high-level operators, it is easy to build parallel apps. These can be availed interactively from the Scala, Python, R, and SQL shells.
3. Generality- Spark combines SQL, streaming, and complex analytics. With a stack of libraries like SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming, it is also possible to combine these into one application.
4. Runs Everywhere- Spark runs on Hadoop, Apache Mesos, or on Kubernetes. It will also run standalone or in the cloud, and can access diverse data sources.