

{"id":1161,"date":"2016-09-21T11:53:34","date_gmt":"2016-09-21T11:53:34","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=1161"},"modified":"2018-11-21T10:21:09","modified_gmt":"2018-11-21T04:51:09","slug":"what-is-spark","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/what-is-spark\/","title":{"rendered":"What is Spark &#8211; Apache Spark Tutorial for Beginners"},"content":{"rendered":"<h2>1. Objective &#8211; Spark Tutorial<\/h2>\n<p>What is Spark? Why there is a serious buzz going on about this technology? I hope this Spark introduction tutorial will help to answer some of these questions.<strong>\u00a0<\/strong><\/p>\n<p>Apache Spark is an open-source cluster computing system that provides high-level API in Java, Scala, Python and R. It can access data from HDFS, Cassandra, <a href=\"http:\/\/data-flair.training\/blogs\/hbase-tutorial-beginners-guide\/\"><strong>HBase<\/strong><\/a>, <a href=\"https:\/\/data-flair.training\/blogs\/apache-hive-tutorial\/\"><strong>Hive<\/strong><\/a>, Tachyon, and any Hadoop data source. And run in <a href=\"https:\/\/data-flair.training\/blogs\/apache-spark-cluster-managers-tutorial\/\"><strong>Standalone, YARN and Mesos cluster manager<\/strong><\/a>.<br \/>\nWhat is Spark tutorial will cover Spark ecosystem components, Spark video tutorial, Spark abstraction \u2013 RDD, transformation, and action in Spark RDD. The objective of this introductory guide is to provide Spark Overview in detail, its history, Spark architecture, deployment model and RDD in Spark.<\/p>\n<div id=\"attachment_43018\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-43018\" class=\"size-full wp-image-43018\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg\" alt=\"What is Spark - Apache Spark Tutorial for Beginners\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-43018\" class=\"wp-caption-text\">What is Spark &#8211; Apache Spark Tutorial for Beginners<\/p><\/div>\n<h2>2. What is Spark?<\/h2>\n<p>Apache Spark is a general-purpose &amp; lightning fast cluster computing system. It provides a high-level API. For example, <a href=\"https:\/\/data-flair.training\/blogs\/java-tutorial\/\"><strong>Java<\/strong><\/a>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/why-you-should-learn-scala-introductory-tutorial\/\">Scala<\/a><\/strong>, <a href=\"https:\/\/data-flair.training\/blogs\/python-tutorial-for-beginners\/\"><strong>Python<\/strong><\/a>, and<strong> <a href=\"http:\/\/data-flair.training\/blogs\/r-programming-tutorial\/\">R<\/a><\/strong>. Apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than <strong><a href=\"http:\/\/data-flair.training\/blogs\/history-big-data\/\">Bigdata<\/a><\/strong> Hadoop and 10 times faster than accessing data from disk.<br \/>\nSpark is written in Scala but provides rich APIs in Scala, Java, Python, and R.<br \/>\nIt can be integrated with <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-introduction-comprehensive-tutorial-guide-beginners\/\">Hadoop <\/a><\/strong>and can process existing Hadoop <strong><a href=\"http:\/\/data-flair.training\/blogs\/introduction-tutorial-hdfs\/\">HDFS<\/a><\/strong> data. Follow this guide to <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/\">learn How Spark is compatible with Hadoop?<\/a><\/strong><br \/>\nIt is saying that the images are the worth of a thousand words. To keep this in mind we have also provided Spark video tutorial for more understanding of Apache Spark.<\/p>\n<h2>3. History Of Apache Spark<\/h2>\n<p>Apache Spark was introduced in 2009 in the UC Berkeley R&amp;D Lab, later it becomes AMPLab. It was open sourced in 2010 under BSD license. In 2013 spark was donated to Apache Software Foundation where it became top-level Apache project in 2014.<\/p>\n<h2>4. Why Spark?<\/h2>\n<p>After studying Apache Spark introduction lets discuss, why Spark come into existence?<br \/>\nIn the industry, there is a need for a general-purpose cluster computing tool as:<\/p>\n<ul>\n<li>Hadoop <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-mapreduce-tutorial-comprehensive-guide-beginners\/\">MapReduce<\/a>\u00a0<\/strong>can only perform batch processing.<\/li>\n<li>Apache Storm \/ S4 can only perform stream processing.<\/li>\n<li>Apache Impala \/ Apache Tez can only perform interactive processing<\/li>\n<li>Neo4j \/ Apache Giraph can only perform graph processing<\/li>\n<\/ul>\n<p>Hence in the industry, there is a big demand for a powerful engine that can process the data in real-time (streaming) as well as in batch mode. There is a need for an engine that can respond in sub-second and perform<a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-in-memory-computing\/\"><strong> in-memory processing.<\/strong><\/a><br \/>\nApache Spark Definition says it is a powerful open-source engine that provides real-time stream processing, interactive processing, graph processing, in-memory processing as well as batch processing with very fast speed, ease of use and standard interface. This creates the difference between <strong><a href=\"http:\/\/data-flair.training\/blogs\/comparison-between-apache-spark-vs-hadoop-mapreduce\/\">Hadoop vs Spark<\/a><\/strong> and also makes a huge <strong><a href=\"http:\/\/data-flair.training\/blogs\/comparison-storm-spark-streaming\/\">comparison between Spark vs Storm<\/a><\/strong>.<br \/>\nIn this What is Spark tutorial, we discussed a definition of spark, history of spark and importance of spark. Now let&#8217;s move towards spark components.<\/p>\n<h2>5. Apache Spark Components<\/h2>\n<p>Apache Spark puts the promise for faster data processing and easier development. How Spark achieves this? To answer this question, let\u2019s introduce the Apache Spark ecosystem which is the important topic in Apache Spark introduction that makes Spark fast and reliable. These components of Spark resolves the issues that cropped up while using Hadoop MapReduce.<\/p>\n<div id=\"attachment_3194\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3194\" class=\"wp-image-3194 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components.jpg\" alt=\"Spark Ecosystem Components\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/spark-ecosystem-components-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-3194\" class=\"wp-caption-text\">What is Spark &#8211; Spark Ecosystem Components<\/p><\/div>\n<p>Here we are going to discuss Spark Ecosystem Components one by one<\/p>\n<h3>i. Spark Core<\/h3>\n<p>It is the kernel of Spark, which provides an execution platform for all the Spark applications. It is a generalized platform\u00a0to support a wide array\u00a0of applications.<\/p>\n<h3>ii. Spark SQL<\/h3>\n<p>It enables users to run SQL\/HQL queries on the top of Spark. Using Apache Spark SQL, we can process structured as well as semi-structured data. It also provides an engine for Hive to run unmodified queries up to 100 times faster on existing deployments. Refer <strong><a href=\"http:\/\/data-flair.training\/blogs\/introduction-to-apache-spark-sql-tutorial\/\">Spark SQL Tutorial<\/a><\/strong> for detailed study.<\/p>\n<h3>iii. Spark Streaming<\/h3>\n<p>Apache Spark Streaming enables powerful interactive and <strong><a href=\"http:\/\/data-flair.training\/blogs\/data-analytics-comprehensive-guide\/\">data analytics<\/a><\/strong> application across live streaming data. The live streams are converted into micro-batches which are executed on top of spark core. Refer our <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-streaming-comprehensive-guide\/\">Spark Streaming tutorial<\/a><\/strong> for detailed study of Apache Spark Streaming.<\/p>\n<h3>iv. Spark MLlib<\/h3>\n<p>It is the scalable <a href=\"http:\/\/data-flair.training\/blogs\/machine-learning-tutorial\/\"><strong>machine learning<\/strong><\/a> library which delivers both efficiencies as well as the high-quality algorithm. Apache Spark MLlib is one of the hottest choices for <strong><a href=\"http:\/\/data-flair.training\/blogs\/valuable-skills-to-become-successful-data-scientist\/\">Data Scientist<\/a><\/strong> due to its capability of in-memory data processing, which improves the performance of iterative algorithm drastically.<\/p>\n<h3>v. Spark GraphX<\/h3>\n<p>Apache Spark <a href=\"https:\/\/data-flair.training\/blogs\/graphx-api-spark\/\"><strong>GraphX<\/strong> <\/a>is the graph computation engine built on top of spark that enables to process graph data at scale.<\/p>\n<h3>vi. SparkR<\/h3>\n<p>It is <strong><a href=\"http:\/\/data-flair.training\/blogs\/r-packages-tutorial\/\">R package<\/a><\/strong> that gives light-weight frontend to use Apache Spark from R. It allows data scientists to analyze large datasets and interactively run jobs on them from the R shell. The main idea behind <a href=\"https:\/\/data-flair.training\/blogs\/sparkr\/\"><strong>SparkR<\/strong><\/a> was to explore different techniques to integrate the usability of R with the scalability of Spark.<br \/>\nRefer <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-ecosystem-components\/\">Spark Ecosystem Guide<\/a><\/strong> for detailed study of Spark components.<\/p>\n<h2>6. Resilient Distributed Dataset &#8211; RDD<\/h2>\n<p>In this section of Apache Spark Tutorial, we will discuss the key abstraction of Spark knows as RDD.<br \/>\n<a href=\"http:\/\/data-flair.training\/blogs\/resilient-distributed-datasets-rdd-apache-spark\/\"><strong>Resilient Distributed Dataset<\/strong><\/a> (RDD) is the fundamental unit of data in Apache Spark, which is a distributed collection of elements across cluster nodes and can perform parallel operations. Spark RDDs are immutable but can generate new RDD by transforming existing RDD.<br \/>\nThere are three\u00a0ways to create RDDs in Spark:<\/p>\n<ul>\n<li><strong>Parallelized collections<\/strong> \u2013 We can create parallelized collections by invoking parallelize method in the driver program.<\/li>\n<li><strong>External datasets<\/strong> \u2013 By calling a textFile method one can create RDDs. This method takes URL of the file and reads it as a collection of lines.<\/li>\n<li><strong>Existing RDDs &#8211;\u00a0<\/strong>By applying transformation operation on existing RDDs we can create new RDD.<\/li>\n<\/ul>\n<p>Learn <strong><a href=\"http:\/\/data-flair.training\/blogs\/how-to-create-rdds-in-apache-spark\/\">How to create RDD in Spark in detail.<\/a><\/strong><br \/>\nApache Spark RDDs support two types of operations:<\/p>\n<ul>\n<li><strong>Transformation<\/strong> \u2013 Creates a new RDD from the existing one. It passes the dataset to the function and returns new dataset.<\/li>\n<li><strong>Action<\/strong> \u2013 Spark Action returns final result to driver program or write it to the external data store.<\/li>\n<\/ul>\n<p>Refer this link to<strong><a href=\"http:\/\/data-flair.training\/blogs\/rdd-transformations-actions-apis-apache-spark\/\"> learn RDD Transformations and Actions APIs with examples<\/a>.<\/strong><\/p>\n<h2>7. Spark Shell<\/h2>\n<p>Apache Spark provides an interactive <strong><a href=\"https:\/\/data-flair.training\/blogs\/scala-spark-shell-commands\/\">spark-shell<\/a>.<\/strong>\u00a0It helps Spark applications to easily run on the command line of the system. Using the Spark shell we can run\/test our application code interactively. Spark can read from many types of data sources so that it can access and process a large amount of data.<\/p>\n<p>So, this was all in the tutorial explaining what is Spark. Hope you like our tutorial.<\/p>\n<h2>8. Conclusion &#8211; What is Spark?<\/h2>\n<p>What is Spark tutorial, provides a collection of technologies that increase the value of big data and permits new Spark use cases. It gives us a unified framework for creating, managing and implementing\u00a0Spark big data processing requirements.\u00a0Spark video tutorial provides you a detailed information about Spark.<br \/>\nIn addition to the MapReduce operations, one can also implement SQL queries and process\u00a0streaming data through Spark, which were the drawbacks for Hadoop-1. With Spark, developers can develop with Spark features either on a stand-alone basis or, combine them with MapReduce programming techniques.<br \/>\n<strong>See Also<\/strong><\/p>\n<ul>\n<li><strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-in-standalone-mode\/\">Spark Installation in pseudo distributed mode.<\/a><\/strong><\/li>\n<li><strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-features\/\">Features of Spark.<\/a><\/strong><\/li>\n<\/ul>\n<p><strong><a href=\"http:\/\/spark.apache.org\/\">Reference<\/a><\/strong><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2354,&quot;href&quot;:&quot;http:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 07:11:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 07:55:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 14:34:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:49:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 15:59:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:08:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 07:19:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 08:37:45&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 09:28:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 11:37:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 12:46:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 20:26:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-20 20:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 06:20:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 06:26:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 07:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 07:26:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 10:18:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 12:50:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 14:05:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 15:00:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-18 00:17:52&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-21 06:52:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 08:35:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 08:54:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 09:01:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 09:57:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 12:27:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:42:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 23:54:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 03:00:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 06:08:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-24 07:13:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-27 09:23:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-02 13:11:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-05 14:53:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 19:36:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 23:42:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-15 01:00:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 06:16:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-21 07:55:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-24 09:26:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 11:00:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:57:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 13:36:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 19:54:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 07:47:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 09:22:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-16 16:11:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 16:22:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-22 17:30:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-25 20:07:42&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-29 03:42:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 10:44:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 10:50:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-07 10:53:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-13 13:26:22&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-17 00:18:56&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-17 00:18:56&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective &#8211; Spark Tutorial What is Spark? Why there is a serious buzz going on about this technology? I hope this Spark introduction tutorial will help to answer some of these questions.\u00a0 Apache&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":43018,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[896,958,959,11340,13024,13028,13029,13043,13050,13057,13072,13142,13145,15959],"class_list":["post-1161","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark","tag-apache-spark-tutorial","tag-apache-spark-tutorial-for-beginners","tag-rdd","tag-spark-and-scala","tag-spark-architecture","tag-spark-big-data","tag-spark-components","tag-spark-definition","tag-spark-ecosystem","tag-spark-introduction","tag-spark-tutorial","tag-spark-video-tutorial","tag-what-is-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>What is Spark - Apache Spark Tutorial for Beginners - DataFlair<\/title>\n<meta name=\"description\" content=\"What is Spark tutorial about Spark introduction, why spark, Hadoop vs Apache Spark, Need of Spark, Architecture, Spark Ecosystem, Spark RDD and Spark shell.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/what-is-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Spark - Apache Spark Tutorial for Beginners - DataFlair\" \/>\n<meta property=\"og:description\" content=\"What is Spark tutorial about Spark introduction, why spark, Hadoop vs Apache Spark, Need of Spark, Architecture, Spark Ecosystem, Spark RDD and Spark shell.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/what-is-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-09-21T11:53:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-21T04:51:09+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Spark - Apache Spark Tutorial for Beginners - DataFlair","description":"What is Spark tutorial about Spark introduction, why spark, Hadoop vs Apache Spark, Need of Spark, Architecture, Spark Ecosystem, Spark RDD and Spark shell.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/what-is-spark\/","og_locale":"en_US","og_type":"article","og_title":"What is Spark - Apache Spark Tutorial for Beginners - DataFlair","og_description":"What is Spark tutorial about Spark introduction, why spark, Hadoop vs Apache Spark, Need of Spark, Architecture, Spark Ecosystem, Spark RDD and Spark shell.","og_url":"https:\/\/data-flair.training\/blogs\/what-is-spark\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2016-09-21T11:53:34+00:00","article_modified_time":"2018-11-21T04:51:09+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"What is Spark &#8211; Apache Spark Tutorial for Beginners","datePublished":"2016-09-21T11:53:34+00:00","dateModified":"2018-11-21T04:51:09+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/"},"wordCount":1217,"commentCount":12,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg","keywords":["apache spark","apache spark tutorial","Apache Spark Tutorial for Beginners","RDD","spark and scala","spark architecture","spark big data","spark components","spark definition","spark ecosystem","spark introduction","spark tutorial","Spark video tutorial","what is spark"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/what-is-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/","url":"https:\/\/data-flair.training\/blogs\/what-is-spark\/","name":"What is Spark - Apache Spark Tutorial for Beginners - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg","datePublished":"2016-09-21T11:53:34+00:00","dateModified":"2018-11-21T04:51:09+00:00","description":"What is Spark tutorial about Spark introduction, why spark, Hadoop vs Apache Spark, Need of Spark, Architecture, Spark Ecosystem, Spark RDD and Spark shell.","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/what-is-spark\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/09\/Spark-Tutorial-for-Beginners-01.jpg","width":1200,"height":628,"caption":"What is Spark - Apache Spark Tutorial for Beginners"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/what-is-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"What is Spark &#8211; Apache Spark Tutorial for Beginners"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1161","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=1161"}],"version-history":[{"count":5,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1161\/revisions"}],"predecessor-version":[{"id":43019,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/1161\/revisions\/43019"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/43018"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=1161"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=1161"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=1161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}