

{"id":2031,"date":"2017-04-17T06:57:30","date_gmt":"2017-04-17T06:57:30","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2031"},"modified":"2021-08-25T22:33:32","modified_gmt":"2021-08-25T17:03:32","slug":"how-hadoop-works-internally","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/","title":{"rendered":"How Hadoop Works Internally &#8211; Inside Hadoop"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:2373,&quot;href&quot;:&quot;https:\\\/\\\/hortonworks.com\\\/apache\\\/hadoop&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20190826234710\\\/https:\\\/\\\/hortonworks.com\\\/apache\\\/hadoop\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 04:49:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-14 21:26:11&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-18 18:36:20&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-22 04:09:39&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-26 05:30:03&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2025-12-29 07:20:59&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-01 22:48:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-05 14:47:26&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-08 16:09:17&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-11 21:50:06&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-15 02:13:38&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-01-18 17:49:50&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-22 09:41:56&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-28 06:03:19&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-01-31 14:37:06&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-02-05 06:12:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-10 08:42:44&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-02-17 18:37:28&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-02-23 06:49:43&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-02-28 02:32:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-05 02:12:24&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-03-09 02:43:32&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-03-16 17:03:05&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-20 12:16:25&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-26 14:29:01&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-03-30 11:37:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-03 09:57:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-08 09:19:00&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-16 07:30:17&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-20 17:47:37&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-23 18:08:27&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-04-26 20:03:13&quot;,&quot;http_code&quot;:503},{&quot;date&quot;:&quot;2026-05-02 12:15:34&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-05 12:49:24&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-10 13:24:09&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-14 20:28:41&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-18 06:05:26&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-23 21:05:42&quot;,&quot;http_code&quot;:206},{&quot;date&quot;:&quot;2026-05-30 18:47:37&quot;,&quot;http_code&quot;:206}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-05-30 18:47:37&quot;,&quot;http_code&quot;:206},&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<p><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-tutorial-for-beginners\/\"><strong>Apache Hadoop<\/strong><\/a> is an open source software framework that stores data in a distributed manner and process that data in parallel. Hadoop provides the world\u2019s most reliable storage layer \u2013 <strong>HDFS<\/strong>, a batch processing engine \u2013 <strong>MapReduce<\/strong> and a resource management layer \u2013 <strong>YARN<\/strong>. In this tutorial on <strong>&#8216;How Hadoop works internally&#8217;<\/strong>, we will learn what is Hadoop, how Hadoop works, different components of Hadoop, daemons in Hadoop, roles of HDFS, MapReduce, and Yarn in Hadoop and various steps to understand\u00a0How Hadoop works.<\/p>\n<div id=\"attachment_52018\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-52018\" class=\"size-full wp-image-52018\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg\" alt=\"How Hadoop Works Internally - Inside Hadoop\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-52018\" class=\"wp-caption-text\">How Hadoop Works Internally &#8211; Inside Hadoop<\/p><\/div>\n<h2>What is Hadoop?<\/h2>\n<p>Before learning how Hadoop works, let&#8217;s brush the basic Hadoop concept. Apache Hadoop is a set of open-source software utilities. They facilitate usage of a network of many computers to solve problems involving massive amounts of data. It provides a software framework for distributed storage and distributed computing. It divides a file into the number of blocks and stores it across a cluster of machines. Hadoop also achieves fault tolerance by replicating the blocks on the cluster. It does distributed processing by dividing a job into a number of independent tasks. These tasks run in parallel over the computer cluster.<\/p>\n<h2>Hadoop Components and Domains<\/h2>\n<p>You can&#8217;t understand the working of Hadoop without knowing its core components. So, Hadoop consists of three layers (core components) and they are:-<\/p>\n<p><strong>HDFS \u2013<\/strong> <strong>Hadoop Distributed File System<\/strong> provides for the storage of Hadoop. As the name suggests it stores the data in a distributed manner. The file gets divided into a number of blocks which spreads across the cluster of commodity hardware.<\/p>\n<p><strong>MapReduce \u2013<\/strong> This is the processing engine of Hadoop.<strong> MapReduce works on the principle of distributed processing<\/strong>. It divides the task submitted by the user into a number of independent subtasks. These sub-task executes in parallel thereby increasing the throughput.<\/p>\n<p><strong>Yarn &#8211;<\/strong> <strong>Yet Another Resource Manage<\/strong>r provides resource management for Hadoop. There are two daemons running for Yarn. One is NodeManager on the slave machines and other is the Resource Manager on the master node. Yarn looks after the allocation of the resources among various slave competing for it.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Basic-Hadoop-Architecture.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-52000 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Basic-Hadoop-Architecture.gif\" alt=\"Hadoo Works\" width=\"800\" height=\"450\" \/><\/a><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-ecosystem\/\"><strong>Learn about all the Hadoop Ecosystem Components in just 7 mins.\u00a0<\/strong><\/a><\/p>\n<p>Daemons are the processes that run in the background. The Hadoop Daemons are:-<\/p>\n<p><strong>a)<\/strong> <strong>Namenode<\/strong> \u2013 It runs on master node for HDFS.<\/p>\n<p><strong>b)<\/strong> <strong>Datanode<\/strong> \u2013 It runs on slave nodes for HDFS.<\/p>\n<p><strong>c)<\/strong> <strong>Resource Manager\u00a0<\/strong>\u2013 It runs on <a href=\"http:\/\/data-flair.training\/blogs\/hadoop-yarn-tutorial\/\"><strong>YARN<\/strong><\/a> master node for MapReduce.<\/p>\n<p><strong>d)<\/strong> <strong>Node Manager<\/strong> \u2013 It runs on YARN slave node for MapReduce.<\/p>\n<p>These 4 daemons run for Hadoop to be functional.<\/p>\n<p><strong>Read: <a href=\"https:\/\/data-flair.training\/blogs\/hadoop-distributed-cache\/\">Distributed Chace in Hadoop<\/a><\/strong><\/p>\n<h2>How Hadoop Works?<\/h2>\n<p>Hadoop does distributed processing for huge data sets across the cluster of commodity servers and works on multiple machines simultaneously. To process any data, the client submits data and program to Hadoop. <strong>HDFS<\/strong> stores the data while <strong>MapReduce <\/strong>process the data and Yarn divide the tasks.<\/p>\n<p>Let&#8217;s discuss in detail how Hadoop works &#8211;<\/p>\n<h3>i. HDFS<\/h3>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-architecture\/\"><strong>Hadoop Distributed File System has<\/strong> <strong>master-slave topology<\/strong><\/a>. It has got two daemons running, they are NameNode and DataNode.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Data-Storage-in-HDFS.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-51996\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Data-Storage-in-HDFS.gif\" alt=\"Hadoop working \" width=\"800\" height=\"450\" \/><\/a><\/p>\n<h4>NameNode<\/h4>\n<p>NameNode is the daemon running of the master machine. It is the centerpiece of an HDFS file system. NameNode stores the directory tree of all files in the file system. It tracks where across the cluster the file data resides. It does not store the data contained in these files.<\/p>\n<p>When the client applications want to add\/copy\/move\/delete a file, they interact with NameNode. The NameNode responds to the request from client by returning a list of relevant DataNode servers where the data lives.<\/p>\n<p><strong>Recommended Reading &#8211;<\/strong> <strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-hdfs-namenode-high-availability\/\">NameNode High Availability<\/a><\/strong><\/p>\n<h4>DataNode<\/h4>\n<p>DataNode daemon runs on the slave nodes. It stores data in the HadoopFileSystem. In functional file system data replicates across many DataNodes.<\/p>\n<p>On startup, a DataNode connects to the NameNode. It keeps on looking for the request from NameNode to access data. Once the NameNode provides the location of the data, client applications can talk directly to a DataNode, while replicating the data, DataNode instances can talk to each other.<\/p>\n<h4>Replica Placement<\/h4>\n<p>The placement of replica decides HDFS reliability and performance. Optimization of replica placement makes HDFS apart from other distributed system. Huge HDFS instances run on a cluster of computers spreads across many racks. The communication between nodes on different racks has to go through the switches. Mostly the network bandwidth between nodes on the same rack is more than that between the machines on separate racks.<\/p>\n<p>The <strong><a href=\"https:\/\/data-flair.training\/blogs\/rack-awareness-hadoop-hdfs\/\">rack awareness algorithm<\/a><\/strong> determines the rack id of each DataNode. Under a simple policy, the replicas get placed on unique racks. This prevents data loss in the event of rack failure. Also, it utilizes bandwidth from multiple racks while reading data. However, this method increases the cost of writes.<\/p>\n<p>Let us assume that the replication factor is three. Suppose HDFS\u2019s placement policy places one replica on a local rack and other two replicas on the remote but same rack. This policy cuts the inter-rack write traffic thereby improving the write performance. The chances of rack failure are less than that of node failure. Hence this policy does not affect data reliability and availability. But, it does reduce the aggregate network bandwidth used when reading data. This is because a block gets placed in only two unique racks rather than three.<\/p>\n<h3>ii. MapReduce<\/h3>\n<p>The <strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-mapreduce-tutorial\/\">general idea of the MapReduce<\/a><\/strong> algorithm is to process the data in parallel on your distributed cluster. It subsequently combine it into the desired result or output.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-MapReduce-works.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-51997\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-MapReduce-works.gif\" alt=\"Hadoop MapReduce Working \" width=\"800\" height=\"450\" \/><\/a><\/p>\n<p>Hadoop MapReduce includes several stages:<\/p>\n<ul>\n<li>In the first step, the program locates and reads the \u00ab input file \u00bb containing the raw data.<\/li>\n<li>As the file format is arbitrary, there is a need to convert data into something the program can process. The \u00ab InputFormat \u00bb and \u00ab RecordReader \u00bb (RR) does this job.<\/li>\n<\/ul>\n<p>InputFormat uses InputSplit function to split the file into smaller pieces<\/p>\n<p>Then the <strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-recordreader\/\">RecordReader<\/a><\/strong> transforms the raw data for processing by the map. It outputs a list of key-value pairs.<\/p>\n<p>Once the mapper process these key-value pairs the result goes to \u00ab OutputCollector \u00bb. There is another function called \u00ab Reporter \u00bb which intimates the user when the mapping task finishes.<\/p>\n<ul>\n<li>In the next step, the Reduce function performs its task on each key-value pair from the mapper.<\/li>\n<li>Finally, OutputFormat organizes the key-value pairs from Reducer for writing it on HDFS.<\/li>\n<li>Being the heart of the Hadoop system, Map-Reduce process the data in a highly resilient, fault-tolerant manner.<\/li>\n<\/ul>\n<h3>iii. Yarn<\/h3>\n<p><strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-yarn-tutorial\/\">Yarn<\/a><\/strong> divides the task on resource management and job scheduling\/monitoring into separate daemons. There is one ResourceManager and per-application ApplicationMaster. An application can be either a job or a DAG of jobs.<\/p>\n<p>The ResourceManger have two components \u2013 Scheduler and AppicationManager.<\/p>\n<p>The <strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-schedulers\/\">scheduler<\/a><\/strong> is a pure scheduler i.e. it does not track the status of running application. It only allocates resources to various competing applications. Also, it does not restart the job after failure due to hardware or application failure. The scheduler allocates the resources based on an abstract notion of a container. A container is nothing but a fraction of resources like CPU, memory, disk, network etc.<\/p>\n<p>Following are the tasks of ApplicationManager:-<\/p>\n<ul>\n<li>Accepts submission of jobs by client.<\/li>\n<li>Negotaites first container for specific ApplicationMaster.<\/li>\n<li>Restarts the container after application failure.<\/li>\n<\/ul>\n<p>Below are the responsibilities of ApplicationMaster<\/p>\n<ul>\n<li>Negotiates containers from Scheduler<\/li>\n<li>Tracking container status and monitoring its progress.<\/li>\n<\/ul>\n<p>Yarn supports the concept of Resource Reservation via ReservationSystem. In this, a user can fix a number of resources for execution of a particular job over time and temporal constraints. The ReservationSystem makes sure that the resources are available to the job until its completion. It also performs admission control for reservation.<\/p>\n<p>Yarn can scale beyond a few thousand nodes via Yarn Federation. YARN Federation allows to wire multiple sub-cluster into the single massive cluster. We can use many independent clusters together for a single large job. It can be used to achieve a large scale system.<\/p>\n<p>Let us summarize how <strong><a href=\"https:\/\/hortonworks.com\/apache\/hadoop\/\">Hadoop<\/a><\/strong> works step by step:<\/p>\n<ul>\n<li>Input data is broken into blocks of size<strong> 128 Mb<\/strong> and then blocks are moved to different nodes.<\/li>\n<li>Once all the blocks of the data are stored on data-nodes, the user can process the data.<\/li>\n<li>Resource Manager then schedules the program (submitted by the user) on individual nodes.<\/li>\n<li>Once all the nodes process the data, the output is written back to HDFS.<\/li>\n<\/ul>\n<p>So, this was all on How Hadoop Works Tutorial.<\/p>\n<h2>Conclusion<\/h2>\n<p>In conclusion to How Hadoop Works, we can say, the client first submits the data and program. HDFS stores that data and MapReduce processes that data.\u00a0So now when we have learned Hadoop introduction and How Hadoop works, let us now learn\u00a0<a href=\"http:\/\/data-flair.training\/blogs\/install-hadoop-on-single-machine\/\"><strong>how to Install Hadoop on a single node<\/strong><\/a> and<strong> multi-node<\/strong> to move ahead in the technology.<\/p>\n<p>Drop a comment if you like the tutorial or have any queries and feedback on &#8216;How Hadoop Works&#8217; we will get back to you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Apache Hadoop is an open source software framework that stores data in a distributed manner and process that data in parallel. Hadoop provides the world\u2019s most reliable storage layer \u2013 HDFS, a batch processing&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":52018,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22],"tags":[5236,5283,5342,5349,5548,5881],"class_list":["post-2031","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hadoop","tag-hadoop-daemons","tag-hadoop-mapreduce","tag-hadoop-tutorial","tag-hadoop-working","tag-hdfs","tag-how-hadoop-works"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How Hadoop Works Internally - Inside Hadoop - DataFlair<\/title>\n<meta name=\"description\" content=\"Learn what is Big data Hadoop,components of Hadoop,daemons in Hadoop,roles of HDFS &amp; MapReduce in Hadoop and steps to understand How Hadoop works internally\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Hadoop Works Internally - Inside Hadoop - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Learn what is Big data Hadoop,components of Hadoop,daemons in Hadoop,roles of HDFS &amp; MapReduce in Hadoop and steps to understand How Hadoop works internally\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-04-17T06:57:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-08-25T17:03:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How Hadoop Works Internally - Inside Hadoop - DataFlair","description":"Learn what is Big data Hadoop,components of Hadoop,daemons in Hadoop,roles of HDFS & MapReduce in Hadoop and steps to understand How Hadoop works internally","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/","og_locale":"en_US","og_type":"article","og_title":"How Hadoop Works Internally - Inside Hadoop - DataFlair","og_description":"Learn what is Big data Hadoop,components of Hadoop,daemons in Hadoop,roles of HDFS & MapReduce in Hadoop and steps to understand How Hadoop works internally","og_url":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-04-17T06:57:30+00:00","article_modified_time":"2021-08-25T17:03:32+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"How Hadoop Works Internally &#8211; Inside Hadoop","datePublished":"2017-04-17T06:57:30+00:00","dateModified":"2021-08-25T17:03:32+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/"},"wordCount":1487,"commentCount":6,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg","keywords":["Hadoop Daemons","hadoop mapreduce","hadoop tutorial","Hadoop working","hdfs","how hadoop works"],"articleSection":["Hadoop Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/","url":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/","name":"How Hadoop Works Internally - Inside Hadoop - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg","datePublished":"2017-04-17T06:57:30+00:00","dateModified":"2021-08-25T17:03:32+00:00","description":"Learn what is Big data Hadoop,components of Hadoop,daemons in Hadoop,roles of HDFS & MapReduce in Hadoop and steps to understand How Hadoop works internally","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/How-Hadoop-Works.jpg","width":1200,"height":628,"caption":"How Hadoop Works Internally - Inside Hadoop"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/how-hadoop-works-internally\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Hadoop Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hadoop\/"},{"@type":"ListItem","position":3,"name":"How Hadoop Works Internally &#8211; Inside Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2031"}],"version-history":[{"count":10,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2031\/revisions"}],"predecessor-version":[{"id":52023,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2031\/revisions\/52023"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/52018"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}