

{"id":2595,"date":"2017-05-13T12:07:48","date_gmt":"2017-05-13T12:07:48","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2595"},"modified":"2018-11-16T17:20:59","modified_gmt":"2018-11-16T11:50:59","slug":"apache-spark-hadoop-compatibility","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/","title":{"rendered":"Apache Spark Compatibility with Hadoop"},"content":{"rendered":"<h2>1. Objective<\/h2>\n<p>In this tutorial on<strong> <a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-introduction-spark-comprehensive-tutorial\/\">Apache Spark <\/a><\/strong>compatibility with Hadoop, we will discuss how Spark is compatible with Hadoop? This tutorial covers three ways to use Apache Spark over Hadoop i.e. <strong>Standalone<\/strong>, <strong>YARN<\/strong>, <strong>SIMR(Spark In MapReduce)<\/strong>. <span class=\"veryhardreadability\">We will also discuss the steps to launch Spark application in standalone mode,\u00a0Launch Spark on YARN, Launch Spark in MapReduce (SIMR), and how SIMR works in this Spark Hadoop compatibility tutorial<\/span>.<\/p>\n<div id=\"attachment_42457\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-42457\" class=\"size-full wp-image-42457\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg\" alt=\"Apache Spark Compatibility with Hadoop\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-42457\" class=\"wp-caption-text\">Apache Spark Compatibility with Hadoop<\/p><\/div>\n<h2>2.\u00a0How is Spark compatible with Hadoop?<\/h2>\n<p>It is always mistaken that Spark replaces\u00a0<strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-introduction-tutorial-quick-guide\/\">Hadoop<\/a><\/strong>, rather it influences the <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-features-design-principles-tutorial\/\">functionality of Hadoop<\/a><\/strong>. Right from the starting Spark\u00a0<a href=\"http:\/\/data-flair.training\/blogs\/hadoop-hdfs-data-read-and-write-operations\/\">read data from and write data to HDFS<\/a> (<strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-hadoop-hdfs-introduction-tutorial\/\">Hadoop Distributed File System<\/a><\/strong>). Thus we can say that Apache Spark is Hadoop-based data processing engine; it can take over batch and streaming data overheads. Hence, running Spark over Hadoop provides enhanced and more functionality.<\/p>\n<h2>3. Apache Spark Compatibility with Hadoop<\/h2>\n<div id=\"attachment_3729\" style=\"width: 812px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3729\" class=\"wp-image-3729 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2.jpg\" alt=\"Spark Hadoop Compatibility\" width=\"802\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/08\/apache-spark-compatibility-with-hadoop-2-768x402.jpg 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/a><p id=\"caption-attachment-3729\" class=\"wp-caption-text\">Spark Hadoop Compatibility<\/p><\/div>\n<p>In three ways we can use Spark over Hadoop:<\/p>\n<ul>\n<li><strong>Standalone &#8211;<\/strong>\u00a0In this deployment mode we can allocate resource on all machines or on a subset of machines in <a href=\"http:\/\/data-flair.training\/blogs\/install-hadoop-2-x-ubuntu-hadoop-multi-node-cluster\/\"><strong>Hadoop Cluster<\/strong><\/a>. We can run Spark side by side with Hadoop <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-mapreduce-introduction-tutorial-comprehensive-guide\/\">MapReduce.<\/a><\/strong><\/li>\n<li><strong>YARN &#8211;<\/strong>\u00a0We can run Spark on<strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-yarn-tutorial\/\"> YARN<\/a><\/strong> without any pre-requisites. Thus, we can also integrate Spark in Hadoop stack and take an <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-features\/\">advantage and facilities of Spark<\/a>.<\/strong><\/li>\n<li><strong>SIMR (Spark in MapReduce) &#8211;<\/strong>\u00a0Another way to do this is by launching Spark job inside Map reduce. With SIMR we can use Spark shell in few minutes after downloading it. Hence, this reduces the overhead of deployment, and we can play with Apache Spark.<\/li>\n<\/ul>\n<p>Let&#8217;s discuss these three ways of Apache Spark compatibility with Hadoop one by one in detail.<\/p>\n<h3>3.1. Launching Spark Application in Standalone Mode<\/h3>\n<p>Before Launching Spark application in standalone mode refer this guide to<a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-in-standalone-mode\/\"> learn\u00a0how to install Apache Spark in Standalone Mode(single node cluster)?<\/a><br \/>\nSpark support two deployment modes for standalone cluster namely the <strong>cluster mode<\/strong> and the <strong>client mode<\/strong>. In client mode, the driver launch in the same process in which client submits the application. In cluster mode, the driver launch from one of worker node process inside the cluster, the client process exit as it submits the application without waiting for the application to finish.<\/p>\n<h4>a. Adding the jar<\/h4>\n<p>If we launch the application through <strong>Spark submit<\/strong>, It automatically distributes the application jar to all worker nodes. For any additional jar specify it through <em>&#8211;jars<\/em> flag, use comma as a delimiter. If the application exits with non-zero exit code, the standalone cluster mode will restart your application.<\/p>\n<h4>b. Running application in Standalone Mode<\/h4>\n<p>If we want to run Spark application in standalone mode by taking input from <strong><a href=\"http:\/\/data-flair.training\/blogs\/features-hadoop-hdfs-overview-beginners\/\">HDFS<\/a><\/strong> use the code:<br \/>\n[php]$ .\/bin\/spark-submit &#8211;class MyApp.class \u00a0&#8211;master MyApp.jar &#8211;input hdfs:\/\/localhost:9000\/input-file-path &#8211;output output-file-path[\/php]<\/p>\n<h3>3.2. Launching Spark on YARN<\/h3>\n<p>Apache Spark running on YARN was added in version 0.6.0 and was improved in later releases.<br \/>\nIf we want to run Spark job on Hadoop YARN cluster(<a href=\"http:\/\/data-flair.training\/blogs\/setup-hadoop-2-yarn-psedo-distributed-mode\/\">Learn to configure Hadoop with yarn in pseudo distributed mode<\/a>) we first need to merge Spark JAR. It merges all the essential and required dependencies. We can achieve this by setting Hadoop version and SPARK_YARN\u00a0environment variable as:<br \/>\n<em>SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt\/sbt assembly.<\/em><br \/>\nMake sure YARN_CONF_DIR or HADOOP_CONF_DIR indicates those directories which have a configuration file for <a href=\"http:\/\/data-flair.training\/blogs\/installation-hadoop-3-x-ubuntu-pseudo-distributed-mode\/\"><strong>Hadoop cluster<\/strong><\/a>. Using these configurations we write to HDFS and connect to <strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-yarn-resource-manager-guide-tutorial\/\">YARN Resource Manager<\/a><\/strong>. The configurations that are contained in this directory are shared among YARN cluster so that all the <strong>containers<\/strong> that are used by applications use the same configuration.<br \/>\nTo launch Spark application on YARN there are two deployment modes namely: the <em>cluster mode<\/em> and the <em>client mode<\/em>.<br \/>\n<strong>i. Cluster Mode &#8211;<\/strong>\u00a0In cluster mode, the <strong>Spark driver<\/strong> runs inside Application Master Process and this is managed by YARN on the cluster.<br \/>\n<strong>ii. Client Mode &#8211;<\/strong>\u00a0In this mode, the driver runs in client process and we use application master only for requesting a resource from YARN and providing it to the driver program.<br \/>\nIn YARN mode the address of ResourceManager is taken from Hadoop Configuration. So, here the \u00a0 \u00a0 \u00a0 &#8211;master parameter is yarn.<br \/>\nIf we want to launch Spark application in cluster mode use the command:<br \/>\n[php]$ .\/bin\/spark-submit &#8211;class path.to.your.Class &#8211;master yarn &#8211;deploy-mode cluster [options] &lt;app jar&gt; [app options][\/php]<br \/>\nIf we want to launch Spark application in client mode use the command (replace cluster in above by client)<br \/>\n[php]$ .\/bin\/spark-submit &#8211;class path.to.your.Class &#8211;master yarn &#8211;deploy-mode client[\/php]<\/p>\n<h4>a.\u00a0Adding other JARs<\/h4>\n<p>When we run in cluster mode, the driver runs on a different machine as that of the client, so to make available the files that are on the client to <strong><a href=\"http:\/\/data-flair.training\/blogs\/sparkcontext-in-apache-spark-tutorial\/\">SparkContext<\/a>.addJar<\/strong>, add those files with &#8211;jars option in the launch command.<br \/>\n<strong>For Example:<\/strong><br \/>\n[php]$ .\/bin\/spark-submit &#8211;class my.main.Class \\<br \/>\n&#8211;master yarn \\<br \/>\n&#8211;deploy-mode cluster \\<br \/>\n&#8211;jars my-other-jar.jar,my-other-other-jar.jar \\<br \/>\nmy-main-jar.jar \\<br \/>\napp_arg1 app_arg2[\/php]<br \/>\nIf we want that Spark runtime jars to be accessible from YARN side, specify-<br \/>\n<em>spark.yarn.archive<\/em><em>\u00a0<\/em><em>or\u00a0<\/em><em>spark.yarn.jars<\/em><em>.<\/em><br \/>\nIf we do not specify these then Spark will form a zip file with all jar under $SPARK_HOME\/jars and upload it to the distributed cache.<\/p>\n<h4>b. Debugging Application on YARN<\/h4>\n<p>In YARN, both the application master and executors run inside the \u201c<strong>containers<\/strong>\u201d. Once the application has completed YARN has two modes to handle container log.<\/p>\n<h5>i. If log aggregation is turned on<\/h5>\n<p>In the case where we turn the log aggregation on using the YARN.log-aggregation-enable\u00a0config, the container logs will copy to <strong>HDFS<\/strong> and will delete from the local machine. And later if we want to view this file from anywhere on cluster use the command:<br \/>\n[php]yarn logs -applicationId &lt;app ID&gt;[\/php]<br \/>\nThis command will print out the content from all log files from all containers from given application.<br \/>\nWe can also see the container log files in HDFS directly by using HDFS Shell or API. If we want to find the directory where the log file is present, use this command:<br \/>\n<em>yarn.nodemanager.remote-app-log-<\/em>dir\u00a0and \u00a0<em>yarn.nodemanager.remote-app-log-dir-suffix<\/em><\/p>\n<h5>ii. If log aggregation is not turned on<\/h5>\n<p>In this case, logs keep locally on each machine under\u00a0<em>yarn_app_logs_dir<\/em>, which generally configures to \/tmp\/logs\u00a0or\u00a0$HADOOP_HOME\/logs\/userlogs depending on the Hadoop version and installation.<br \/>\nIf we want to view log from a container, we must first go to the host that contains it and look in the directory. Further, the sub-directory maintains log files by application ID and container ID.<\/p>\n<h3>3.3. Launching Spark in MapReduce (SIMR)<\/h3>\n<p>It is an easy way for Hadoop MapReduce 1 user to use Apache Spark. Using this we can run Spark job and Spark Shell without <strong><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-installation-on-multi-node-cluster-step-by-step-guide\/\">installing Spark<\/a><\/strong> or <strong><a href=\"http:\/\/data-flair.training\/blogs\/why-you-should-learn-scala-introductory-tutorial\/\">Scala<\/a><\/strong>, or have administrative rights. The only pre-requisites are HDFS access and MapReduce v1. SIMR is open-sourced and is a joint work of Databricks and UC Berkeley\u00a0AMPLab. Once you download the SIMR, we can try it by typing<br \/>\n[php].\/simr &#8211;shell[\/php]<br \/>\nTo use this user only need to download package of SIMR that matches <a href=\"http:\/\/data-flair.training\/blogs\/install-deploy-cloudera-hadoop-cdh5-apache-2-x-centos\/\"><strong>Hadoop cluster.<\/strong><\/a> The package of SIMR contains 3 files:<br \/>\nSIMR runtime script: simr<br \/>\nsimr-&lt;hadoop-version&gt;.jar<br \/>\nspark-assembly-&lt;hadoop-version&gt;.jar<br \/>\nTo get usage information place all three in the directory and execute SIMR. The job SIMR is to launch MapReduce jobs with required number of map slots and makes sure that Spark\/Scala and jobs are sent to all those nodes. One of the <strong><a href=\"http:\/\/data-flair.training\/blogs\/mapper-in-hadoop-mapreduce\/\">mappers<\/a><\/strong> is set as master and inside that mapper, Spark driver is made to run. On the remaining mapper SMIR launch the Spark executor, these executors will execute a task on behalf of the driver.<br \/>\nThe master is selected by leader election by <strong><a href=\"http:\/\/data-flair.training\/blogs\/hdfs-data-write-operation\/\">writing to HDFS<\/a><\/strong>, the mapper which writes first in the HDFS is set as the master mapper. And the remaining mapper finds the driver URL by <strong><a href=\"http:\/\/data-flair.training\/blogs\/hdfs-data-read-operation\/\">reading a specific file from HDFS<\/a><\/strong>. Thus, in place of cluster manager SIMR uses MapReduce and HDFS.<\/p>\n<h4>a. How does SIMR work?<\/h4>\n<p><strong>SIMR<\/strong> allows the user to interact with the driver program. On the master mapper, the SIMR runs the relay server and the relay client is run on the machine that launches SIMR. The input to the client and the output from the driver goes to and fro between the client and the master mapper. Hence, to achieve all this we extensively use HDFS.<\/p>\n<h2>4. Conclusion<\/h2>\n<p>In conclusion to Apache Spark compatibility with Hadoop, we can say that Spark is a Hadoop-based data processing framework; it can take over batch and streaming data overheads. Hence, running Spark over Hadoop provides enhanced and extra functionality. After studying Hadoop Spark compatibility follow this guide to <strong><a href=\"http:\/\/data-flair.training\/blogs\/how-apache-spark-works-run-time-spark-architecture\/\">learn how Apache Spark works?<\/a><\/strong><br \/>\nIf you feel any query about this post on Apache Spark Compatibility with Hadoop, please feel free to share with us. Hope we will solve them.<br \/>\n<strong>Reference-<\/strong><br \/>\n<a href=\"http:\/\/spark.apache.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/spark.apache.org\/<\/a><br \/>\n<strong>See Also-<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/limitations-of-apache-spark\/\">Apache Spark Limitations<\/a><\/li>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/apache-spark-interview-questions-and-answers\/\">50+ Apache Spark Interview Questions and Answers<\/a><\/li>\n<\/ul>\n<p><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2354,&quot;href&quot;:&quot;http:\\\/\\\/spark.apache.org&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251009215151\\\/https:\\\/\\\/spark.apache.org\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 07:11:19&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-17 07:55:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-20 14:34:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-23 15:49:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 15:59:57&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-30 07:08:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-02 07:19:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 08:37:45&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-08 09:28:47&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-11 11:37:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-14 12:46:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-17 20:26:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-20 20:31:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-24 06:20:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-27 06:26:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-30 07:17:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-02 07:26:54&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-05 10:18:07&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-08 12:50:55&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-11 14:05:53&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-14 15:00:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-18 00:17:52&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-21 06:52:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-24 08:35:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-27 08:54:29&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-02 09:01:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 09:57:46&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-08 12:27:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-11 12:42:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-14 23:54:40&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-18 03:00:10&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-21 06:08:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-24 07:13:58&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-27 09:23:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-02 13:11:14&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-05 14:53:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 19:36:36&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-11 23:42:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-15 01:00:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-18 06:16:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-21 07:55:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-24 09:26:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-27 11:00:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-30 12:57:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-03 13:36:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-06 19:54:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 07:47:43&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-13 09:22:32&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-16 16:11:08&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-19 16:22:48&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-22 17:30:06&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-25 20:07:42&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-29 03:42:28&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-01 10:44:16&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-04 10:50:12&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-07 10:53:31&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-10 11:32:26&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-13 13:26:22&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-17 00:18:56&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-20 03:02:15&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-23 10:46:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-26 12:51:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-06-29 12:51:51&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-07-02 13:46:38&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-07-02 13:46:38&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In this tutorial on Apache Spark compatibility with Hadoop, we will discuss how Spark is compatible with Hadoop? This tutorial covers three ways to use Apache Spark over Hadoop i.e. Standalone, YARN,&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":42457,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[2787,13023,13042],"class_list":["post-2595","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-spark","tag-compatibility-of-spark-with-hadoop","tag-spark-and-hadoop-compatibility","tag-spark-compatibility-with-hadoop"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Spark Compatibility with Hadoop - DataFlair<\/title>\n<meta name=\"description\" content=\"Apache Spark Compatibility with Hadoop tutorial-3 Ways Apache Spark Works With Apache Hadoop-Spark Standalone Mode,Spark on YARN,SIMR. learn how SIMR works?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark Compatibility with Hadoop - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Apache Spark Compatibility with Hadoop tutorial-3 Ways Apache Spark Works With Apache Hadoop-Spark Standalone Mode,Spark on YARN,SIMR. learn how SIMR works?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-05-13T12:07:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-16T11:50:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Spark Compatibility with Hadoop - DataFlair","description":"Apache Spark Compatibility with Hadoop tutorial-3 Ways Apache Spark Works With Apache Hadoop-Spark Standalone Mode,Spark on YARN,SIMR. learn how SIMR works?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/","og_locale":"en_US","og_type":"article","og_title":"Apache Spark Compatibility with Hadoop - DataFlair","og_description":"Apache Spark Compatibility with Hadoop tutorial-3 Ways Apache Spark Works With Apache Hadoop-Spark Standalone Mode,Spark on YARN,SIMR. learn how SIMR works?","og_url":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-05-13T12:07:48+00:00","article_modified_time":"2018-11-16T11:50:59+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Apache Spark Compatibility with Hadoop","datePublished":"2017-05-13T12:07:48+00:00","dateModified":"2018-11-16T11:50:59+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/"},"wordCount":1530,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg","keywords":["compatibility of spark with hadoop","spark and hadoop compatibility","spark compatibility with hadoop"],"articleSection":["Apache Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/","url":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/","name":"Apache Spark Compatibility with Hadoop - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg","datePublished":"2017-05-13T12:07:48+00:00","dateModified":"2018-11-16T11:50:59+00:00","description":"Apache Spark Compatibility with Hadoop tutorial-3 Ways Apache Spark Works With Apache Hadoop-Spark Standalone Mode,Spark on YARN,SIMR. learn how SIMR works?","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/05\/Apache-Spark-Compatibility-with-Hadoop.jpg","width":1200,"height":628,"caption":"Apache Spark Compatibility with Hadoop"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/apache-spark-hadoop-compatibility\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/spark\/"},{"@type":"ListItem","position":3,"name":"Apache Spark Compatibility with Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2595","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2595"}],"version-history":[{"count":6,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2595\/revisions"}],"predecessor-version":[{"id":42458,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2595\/revisions\/42458"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/42457"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2595"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2595"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2595"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}