

{"id":2343,"date":"2017-04-28T11:28:17","date_gmt":"2017-04-28T11:28:17","guid":{"rendered":"http:\/\/data-flair.training\/blogs\/?p=2343"},"modified":"2018-11-21T11:33:18","modified_gmt":"2018-11-21T06:03:18","slug":"shuffling-and-sorting-in-hadoop","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/","title":{"rendered":"Shuffling and Sorting in Hadoop MapReduce"},"content":{"rendered":"<h2>1. Objective<\/h2>\n<p>In <a href=\"http:\/\/data-flair.training\/blogs\/hadoop-tutorial-for-beginners\/\"><strong>Hadoop<\/strong><\/a>, the process by which the intermediate output from <strong>mappers<\/strong> is transferred to the <strong>reducer<\/strong> is called Shuffling. Reducer gets 1 or more keys and associated values on the basis of reducers. Intermediated <strong>key-value<\/strong> generated by mapper is sorted automatically by key.\u00a0In this blog, we will discuss in detail about shuffling and Sorting in <strong>Hadoop<\/strong> <strong>MapReduce<\/strong>.<\/p>\n<p>Here we will learn what is sorting in Hadoop, what is shuffling in Hadoop, what is the purpose of Shuffling and sorting phase in<strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-mapreduce-introduction-tutorial-comprehensive-guide\/\"> MapReduce<\/a><\/strong>, how MapReduce shuffle works and how MapReduce sort works. We will also learn what is secondary sorting in MapReduce?<\/p>\n<p>To learn Hadoop Cloudera\u00a0CDH5 installation<a href=\"http:\/\/data-flair.training\/blogs\/install-deploy-cloudera-hadoop-cdh5-apache-2-x-centos\/\"> follow this installation guide.<\/a><\/p>\n<div id=\"attachment_43065\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-43065\" class=\"size-full wp-image-43065\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg\" alt=\"Shuffling and Sorting in Hadoop MapReduce\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1-1024x536.jpg 1024w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1-520x272.jpg 520w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-43065\" class=\"wp-caption-text\">Shuffling and Sorting in Hadoop MapReduce<\/p><\/div>\n<h2>2. What is Shuffling and Sorting in Hadoop MapReduce?<\/h2>\n<p>Before we start with Shuffle and Sort in MapReduce, let us revise the other phases of MapReduce like<strong><a href=\"http:\/\/data-flair.training\/blogs\/mapper-in-hadoop-mapreduce\/\"> Mapper<\/a><\/strong>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/reducer-in-hadoop-mapreduce\/\">reducer<\/a><\/strong> in MapReduce, <strong><a href=\"http:\/\/data-flair.training\/blogs\/combiner-in-hadoop-mapreduce-advantages-disadvantages\/\">Combiner<\/a><\/strong>, <strong><a href=\"http:\/\/data-flair.training\/blogs\/partitioner-in-hadoop-mapreduce-hadoop-internals\/\">partitioner in MapReduce<\/a> <\/strong>and<strong><a href=\"http:\/\/data-flair.training\/blogs\/hadoop-inputformat-types\/\"> inputFormat in MapReduce.<\/a><\/strong><\/p>\n<p><strong>Shuffle<\/strong> <strong>phase<\/strong> in Hadoop transfers the map output from Mapper to a Reducer in MapReduce. <strong>Sort phase<\/strong> in MapReduce covers the merging and sorting of map outputs. Data from the mapper are grouped by the key, split among reducers and sorted by the key. Every reducer obtains all values associated with the same key. Shuffle and sort phase in Hadoop occur simultaneously and are done by the MapReduce framework.<\/p>\n<p>Let us now understand both these processes in details below:<\/p>\n<h2>3. Shuffling in MapReduce<\/h2>\n<p>The process of transferring data from the mappers to reducers is known as shuffling i.e. the process by which the system performs the sort and transfers the map output to the reducer as input. So, MapReduce shuffle phase is necessary for the reducers, otherwise, they would not have any input (or input from every mapper). As shuffling can start even before the map phase has finished so this saves some time and completes the tasks in lesser time.<\/p>\n<h2>4. Sorting in MapReduce<\/h2>\n<p>The keys generated by the mapper are automatically sorted by MapReduce Framework, i.e. Before starting of reducer, all intermediate <a href=\"http:\/\/data-flair.training\/blogs\/key-value-pairs-hadoop-mapreduce\/\"><strong>key-value pairs<\/strong> <\/a>in MapReduce that are generated by mapper get sorted by key and not by value. Values passed to each reducer are not sorted; they can be in any order.<\/p>\n<p>Sorting in Hadoop helps reducer to easily distinguish when a new reduce task should start. This saves time for the reducer. Reducer starts a new reduce task when the next key in the sorted input data is different than the previous. Each reduce task takes key-value pairs as input and generates key-value pair as output.<\/p>\n<p>Note that shuffling and sorting in Hadoop MapReduce is not performed at all if you specify zero reducers (setNumReduceTasks(0)). Then, the MapReduce job stops at the map phase, and the map phase does not include any kind of sorting (so even the map phase is faster).<\/p>\n<h2>5. Secondary Sorting in MapReduce<\/h2>\n<p>If we want to sort reducer\u2019s values, then the secondary sorting technique is used as it enables us to sort the values (in ascending or descending order) passed to each reducer.<\/p>\n<h2>6. Conclusion<\/h2>\n<p>In conclusion, Shuffling-Sorting occurs simultaneously to summarize the Mapper intermediate output. Shuffling and sorting in Hadoop MapReduce are not performed at all if you specify zero reducers (setNumReduceTasks(0)).<\/p>\n<p>If you find this blog helpful, or you have any query in Shuffling and Sorting in Hadoop, so, please leave a comment. Hope we will solve your queries.<br \/>\n<strong>See Also-<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/how-hadoop-mapreduce-works\/\">How does Hadoop MapReduce Works?<\/a><\/li>\n<li><a href=\"http:\/\/data-flair.training\/blogs\/mapreduce-interview-questions\/\">50\u00a0Top Hadoop MapReduce Interview Questions and Answers.<\/a><\/li>\n<\/ul>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/\">Reference<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Objective In Hadoop, the process by which the intermediate output from mappers is transferred to the reducer is called Shuffling. Reducer gets 1 or more keys and associated values on the basis of&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":43065,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[5295,5297,12863,12865,13003],"class_list":["post-2343","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-hadoop-mapreduce-sorting","tag-hadoop-mapresuce-shuffling","tag-shuffling-and-sorting-phase-in-mapreduce","tag-shuffling-in-hadoop-mapreduce","tag-sorting-in-hadoop-mapreduce"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Shuffling and Sorting in Hadoop MapReduce - DataFlair<\/title>\n<meta name=\"description\" content=\"Shuffling and Sorting in Hadoop MapReduce Covers What is Shuffling in Hadoop,What is Sorting in MapReduce,how Hadoop shuffle works,how MapReduce sort works?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Shuffling and Sorting in Hadoop MapReduce - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Shuffling and Sorting in Hadoop MapReduce Covers What is Shuffling in Hadoop,What is Sorting in MapReduce,how Hadoop shuffle works,how MapReduce sort works?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-04-28T11:28:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-21T06:03:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Shuffling and Sorting in Hadoop MapReduce - DataFlair","description":"Shuffling and Sorting in Hadoop MapReduce Covers What is Shuffling in Hadoop,What is Sorting in MapReduce,how Hadoop shuffle works,how MapReduce sort works?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"Shuffling and Sorting in Hadoop MapReduce - DataFlair","og_description":"Shuffling and Sorting in Hadoop MapReduce Covers What is Shuffling in Hadoop,What is Sorting in MapReduce,how Hadoop shuffle works,how MapReduce sort works?","og_url":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2017-04-28T11:28:17+00:00","article_modified_time":"2018-11-21T06:03:18+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Shuffling and Sorting in Hadoop MapReduce","datePublished":"2017-04-28T11:28:17+00:00","dateModified":"2018-11-21T06:03:18+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/"},"wordCount":593,"commentCount":12,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg","keywords":["Hadoop Mapreduce sorting","Hadoop MapResuce shuffling","Shuffling and sorting phase in MapReduce","shuffling in hadoop MapReduce","sorting in hadoop Mapreduce"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/","url":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/","name":"Shuffling and Sorting in Hadoop MapReduce - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg","datePublished":"2017-04-28T11:28:17+00:00","dateModified":"2018-11-21T06:03:18+00:00","description":"Shuffling and Sorting in Hadoop MapReduce Covers What is Shuffling in Hadoop,What is Sorting in MapReduce,how Hadoop shuffle works,how MapReduce sort works?","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2017\/04\/Shuffling-Sorting-in-hadoop-01-1.jpg","width":1200,"height":628,"caption":"Shuffling and Sorting in Hadoop MapReduce"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/shuffling-and-sorting-in-hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"MapReduce Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/mapreduce\/"},{"@type":"ListItem","position":3,"name":"Shuffling and Sorting in Hadoop MapReduce"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2343","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=2343"}],"version-history":[{"count":6,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2343\/revisions"}],"predecessor-version":[{"id":43066,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/2343\/revisions\/43066"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/43065"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=2343"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=2343"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=2343"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}