

{"id":456,"date":"2016-06-13T15:15:22","date_gmt":"2016-06-13T15:15:22","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=456"},"modified":"2018-11-20T12:46:03","modified_gmt":"2018-11-20T07:16:03","slug":"what-is-apache-spark","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/","title":{"rendered":"What is Apache Spark &#8211; A Quick Guide to Drift in Spark"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:2354,&quot;href&quot;:&quot;http:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 07:11:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 07:55:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 14:34:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:49:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 15:59:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:08:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 07:19:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 08:37:45&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 09:28:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 11:37:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 12:46:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 20:26:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-20 20:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 06:20:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 06:26:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 07:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 07:26:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 10:18:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 12:50:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 14:05:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 15:00:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-18 00:17:52&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-21 06:52:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 08:35:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 08:54:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 09:01:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 09:57:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 12:27:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:42:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 23:54:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 03:00:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 06:08:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-24 07:13:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-27 09:23:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-02 13:11:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-05 14:53:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 19:36:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 23:42:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-15 01:00:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 06:16:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-21 07:55:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-24 09:26:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 11:00:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:57:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 13:36:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 19:54:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 07:47:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 09:22:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-16 16:11:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 16:22:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-22 17:30:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-25 20:07:42&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-29 03:42:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 10:44:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 10:50:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-07 10:53:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<h2>1. Objective<\/h2>\n<p>In this Apache Spark tutorial, we will have a brief look at What is<strong> Apache Spark<\/strong>, What is the history of Spark? Apache Spark is an advanced analytics engine which can easily process real-time data. It is an in-memory processing framework which is efficient and much faster as compared to others like MapReduce. This tutorial will also cover <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-ecosystem-components\/\"><strong>ecosystem of Spark<\/strong><\/a>, Features of Apache Spark and industries those are using Apache Spark for day by day data operations.<\/p>\n<div id=\"attachment_42896\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-42896\" class=\"size-full wp-image-42896\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg\" alt=\"What is Apache Spark - A Quick Guide to Drift in Spark\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-42896\" class=\"wp-caption-text\">What is Apache Spark &#8211; A Quick Guide to Drift in Spark<\/p><\/div>\n<h2>2. What is Apache Spark?<\/h2>\n<p>Earlier <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-introduction-comprehensive-tutorial-guide-beginners\/\">Hadoop<\/a> <a href=\"http:\/\/data-flair.training\/blogs\/hadoop-mapreduce-tutorial-comprehensive-guide-beginners\/\">MapReduce<\/a><\/strong> became very popular because of its efficiency to process large datasets. MapReduce is a framework which supports only Batch Processing. What if anybody wants to perform real-time analytics? What if anybody wants to process live streaming data? All these problems are addressed by a very popular data processing engine that is <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-tutorial-quickstart-introduction\/\">Apache Spark<\/a><\/strong>.<\/p>\n<p>Spark was introduced by UC Berkeley&#8217;s in 2009, later in 2013, the project was donated to the Apache Software Foundation. Apache Spark became a top-level Apache project in 2014.<\/p>\n<p>Apache Spark is an advanced analytics engine which can easily process real-time data. It is an <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-in-memory-computing\/\">in-memory processing<\/a><\/strong> framework which is efficient and much faster as compared to MapReduce. Apache Spark is highly efficient in iterative data processing. It writes the intermediate data into the memory so data need to be processed is already present in the memory so there is no need to read\/write the data from disk (which saves huge disk seek time) in each step.<\/p>\n<p>You can also refer this video tutorial for Apache Spark for more understanding.<\/p>\n<h2>3. Components of Spark<\/h2>\n<div id=\"attachment_3072\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3072\" class=\"wp-image-3072 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components.jpg\" alt=\"Ecosystem Components of Apache Spark\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/07\/apache-spark-ecosystem-components-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-3072\" class=\"wp-caption-text\">Ecosystem Components of Apache Spark<\/p><\/div>\n<p>After studying what is Apache Spark, let&#8217;s now discuss the Spark Ecosystem which empowers the Spark functionality.<\/p>\n<h3>3.1. Spark Core<\/h3>\n<p><strong>Apache Spark core<\/strong> is the execution engine for Spark which handles critical functionalities of Apache Spark like- memory management, task scheduling, interaction with storage systems and fault recovery. Spark core also consists of various APIs like <strong><a href=\"http:\/\/data-flair.training\/blogs\/r-programming-tutorial\/\">R<\/a><\/strong>, <strong>Java<\/strong>, <strong>Python<\/strong>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/why-you-should-learn-scala-introductory-tutorial\/\">Scala<\/a><\/strong>, etc. It is also a home to the APIs that define <strong><a href=\"http:\/\/data-flair.training\/blogs\/rdd-apache-spark\/\">resilient distributed datasets (RDDs)<\/a><\/strong>\u00a0which are the main programming abstraction of Apache Spark.<\/p>\n<h3>3.2. Spark SQL<\/h3>\n<p><a href=\"http:\/\/data-flair.training\/blogs\/spark-sql-tutorial\/\"><strong>Apache\u00a0Spark SQL<\/strong><\/a> is built on the top of Spark core which handles querying data via SQL queries and it also supports Apache <strong><a href=\"http:\/\/data-flair.training\/blogs\/hive-tutorial-an-introductory-guide-for-beginners\/\">Hive<\/a> <\/strong>variant of SQL that is <strong>Hive Query Language (HQL)<\/strong>. Along with providing a SQL support to Spark, it also supports developers to combine SQL queries with the various programmatic data manipulations supported by RDDs in Java, Scala, and Python.<\/p>\n<h3>3.3. Spark Streaming<\/h3>\n<p><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-streaming-comprehensive-guide\/\"><strong>Apache\u00a0Spark Streaming<\/strong><\/a> is the component which supports processing of live streaming of data. Spark Streaming provides the APIs which is like the Spark Core\u2019s RDD provided by Spark Core. It helps a programmer to manipulate data stored on disk, in memory or arriving in real time.<\/p>\n<h3>3.4. MLlib<\/h3>\n<p><strong>Spark MLlib<\/strong> is a library consists of common machine learning (ML) functionalities. It provides various kinds of machine learning algorithms. Spark MLlib includes regression, clustering, classification and collaborative filtering. It also provides functionality like model evaluation.<\/p>\n<h3>3.5. GraphX<\/h3>\n<p><strong>GraphX<\/strong> is a library used for performing parallel computations and manipulations of graphs in Apache Spark. Spark GraphX also extends the <strong><a href=\"http:\/\/data-flair.training\/blogs\/rdd-transformations-actions-apis-apache-spark\/\">Spark RDD APIs<\/a><\/strong> similarly as Spark core and Spark SQL. It allows us to create a directed graph. For manipulating graphs, GraphX provides various operators and a library of common graph algorithms.<\/p>\n<h3>3.6 SparkR<\/h3>\n<p>It is<a href=\"http:\/\/data-flair.training\/blogs\/r-packages-tutorial\/\"> R package<\/a> that gives light-weight frontend to use Apache Spark from R. The main idea behind <strong>SparkR<\/strong> was to explore different techniques to integrate the usability of R with the scalability of Spark.\u00a0It allows<a href=\"http:\/\/data-flair.training\/blogs\/skills-needed-to-become-a-data-scientist\/\"> data scientists<\/a> to analyze large datasets and interactively run jobs on them from the R shell.<\/p>\n<h2>4. Feature of Apache Spark<\/h2>\n<p>Till now we have answered what is Apache Spark, what are the Spark ecosystem components? Now we will discuss the advantages of Apache Spark due to which Spark come into limelight. The various features of Spark are:<\/p>\n<h3>4.1 Speed<\/h3>\n<p>Speed is the reason behind the popularity of Apache Spark in various IT organizations. Since read\/write operations get reduce while using Spark. Thus it is 100 times faster in memory processing and 10 times faster in disk processing.<\/p>\n<h3><span id=\"44_Reusability\">4.2. Reusability<\/span><\/h3>\n<p>Apache Spark provides the provision of code reusability for batch processing, join streams against historical data, or run ad-hoc queries on stream state.<\/p>\n<h3><span id=\"45_Fault_tolerance\">4.3. Fault tolerance<\/span><\/h3>\n<p>Spark and its RDD abstraction are designed to seamlessly handle failures of any worker nodes in the cluster. Thus, the loss of data and information is negligible. Follow this guide to<a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-streaming-fault-tolerance\/\"> learn Spark Fault tolerance feature in detail<\/a>.<\/p>\n<h3>4.4 High-level Analytics<\/h3>\n<p>The best and unique feature of Apache Spark is its versatility. It supports Machine learning (ML), Graph algorithms, SQL queries and Streaming data along with MapReduce.<\/p>\n<h3>4.5 Supports Many Languages<\/h3>\n<p>Spark provides built-in APIs for various languages like Java, Scala or Python. Thus, it is possible to write applications in different languages.<\/p>\n<h2>5. Spark in Industries<\/h2>\n<p>IT organizations, for example, Cloudera, Pivotal, IBM, Intel, and MapR have all used Spark into their Hadoop stacks. Databricks, an organization established by a part of the developers of Spark, offers business backing for the product. Companies like Yahoo, NASA, Amazon, AutoDesk, eBay, Groupon, Taboola, TripAdvisor, Zaloni, among others, use the product for day by day data operations.<\/p>\n<h2>6. Conclusion<\/h2>\n<p>In conclusion, Apache Spark is a cluster computing platform designed to be fast, speed side and extends the popular MapReduce model to efficiently supports more type of computations, including interactive queries and stream processing. Since Spark integrates closely with other <a href=\"http:\/\/data-flair.training\/blogs\/why-learn-big-data-use-cases\/\"><strong>big data<\/strong><\/a> tool, hence this tight integration is the ability to build an application that seamlessly combines different computation model.<\/p>\n<p>I hope you now you are comfortable with Spark, so try your hands on <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-quiz-part-1\/\">Apache Spark Quiz<\/a> and test your knowledge.<\/p>\n<p>For any query related to Apache Spark, please leave a comment.<br \/>\n<strong>See Also-<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-in-standalone-mode\/\">Apache Spark Installation.<\/a><\/li>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/how-apache-spark-works-run-time-spark-architecture\/\">How does Apache Spark work?<\/a><\/li>\n<\/ul>\n<p>Reference:<br \/>\n<a href=\"http:\/\/spark.apache.org\/\">http:\/\/spark.apache.org\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In this Apache Spark tutorial, we will have a brief look at What is Apache Spark, What is the history of Spark? Apache Spark is an advanced analytics engine which can easily&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42896,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[896,1907,1971,3416,5150,11127,13021,13099,13139,13142,13151,13153,13154,13892],"class_list":["post-456","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-apache-spark","tag-big-data","tag-big-data-training","tag-data-science","tag-graphx","tag-quickstart","tag-spark","tag-spark-quickstart","tag-spark-training","tag-spark-tutorial","tag-spark-core","tag-spark-shell","tag-spark-sql","tag-streaming-processing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>What is Apache Spark - A Quick Guide to Drift in Spark - DataFlair<\/title>\n<meta name=\"description\" content=\"Apache Spark tutorial covers what is Apache Spark,History of Spark, why Apache Spark, Spark architecture,Apache Spark Features &amp; Spark in Bigdata industries\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Apache Spark - A Quick Guide to Drift in Spark - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Apache Spark tutorial covers what is Apache Spark,History of Spark, why Apache Spark, Spark architecture,Apache Spark Features &amp; Spark in Bigdata industries\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-06-13T15:15:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-20T07:16:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Apache Spark - A Quick Guide to Drift in Spark - DataFlair","description":"Apache Spark tutorial covers what is Apache Spark,History of Spark, why Apache Spark, Spark architecture,Apache Spark Features & Spark in Bigdata industries","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/","og_locale":"en_US","og_type":"article","og_title":"What is Apache Spark - A Quick Guide to Drift in Spark - DataFlair","og_description":"Apache Spark tutorial covers what is Apache Spark,History of Spark, why Apache Spark, Spark architecture,Apache Spark Features & Spark in Bigdata industries","og_url":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2016-06-13T15:15:22+00:00","article_modified_time":"2018-11-20T07:16:03+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"What is Apache Spark &#8211; A Quick Guide to Drift in Spark","datePublished":"2016-06-13T15:15:22+00:00","dateModified":"2018-11-20T07:16:03+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/"},"wordCount":1013,"commentCount":22,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg","keywords":["apache spark","big data","big data training","data science","graphx","quickstart","Spark","spark quickstart","spark training","spark tutorial","spark-core","spark-shell","spark-sql","streaming-processing"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/","url":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/","name":"What is Apache Spark - A Quick Guide to Drift in Spark - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg","datePublished":"2016-06-13T15:15:22+00:00","dateModified":"2018-11-20T07:16:03+00:00","description":"Apache Spark tutorial covers what is Apache Spark,History of Spark, why Apache Spark, Spark architecture,Apache Spark Features & Spark in Bigdata industries","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/06\/Apache-Spark-Tutorial-01.jpg","width":1200,"height":628,"caption":"What is Apache Spark - A Quick Guide to Drift in Spark"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/what-is-apache-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"What is Apache Spark &#8211; A Quick Guide to Drift in Spark"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=456"}],"version-history":[{"count":6,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/456\/revisions"}],"predecessor-version":[{"id":42897,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/456\/revisions\/42897"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42896"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}