

{"id":10720,"date":"2018-03-14T00:00:05","date_gmt":"2018-03-14T00:00:05","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=10720"},"modified":"2018-03-14T00:00:05","modified_gmt":"2018-03-14T00:00:05","slug":"bucket-map-join","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/","title":{"rendered":"Bucket Map Join in Hive &#8211; Tips &amp; Working"},"content":{"rendered":"<p><span style=\"font-weight: 400\">In the last article, we discuss<strong> Map Side Join in\u00a0Hive<\/strong>. Basically, while the tables are large and all the tables used in the join are bucketed on the join columns we use a Bucket Map Join in<strong>\u00a0Hive<\/strong>. <\/span><\/p>\n<p><span style=\"font-weight: 400\">In this article, we will cover the whole concept of Apache Hive Bucket Map Join. It also includes use cases, disadvantages, and Bucket Map Join example which will enhance our knowledge.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Introduction to Bucket Map Join<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In <strong>Apache Hive<\/strong>, while the tables are large and all the tables used in the join are bucketed on the join columns we use Hive Bucket Map Join feature. Moreover, one table should have buckets in multiples of the number of buckets in another table in this type of join.<\/span><\/p>\n<div id=\"attachment_10726\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10726\" class=\"wp-image-10726 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join.jpg\" alt=\"Bucket Map Join\" width=\"1200\" height=\"719\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-150x90.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-300x180.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-768x460.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-1024x614.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-10726\" class=\"wp-caption-text\">How Bucket Map Join Works<\/p><\/div>\n<p><span style=\"font-weight: 400\">Let\u2019s understand with an example. For suppose if one table has 2 buckets then the other table must have either 2 buckets or a multiple of 2 buckets (2, 4, 6, and so on). Further, since the preceding condition is satisfied then the joining can be done on the mapper side only. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Else a normal inner join is performed. Therefore, it implies that only the required buckets are fetched on the mapper side and not the complete table. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Hence, onto each mapper, only the matching buckets of all small tables are replicated. As a result of this, the efficiency of the query improves drastically. However, make sure data does not sort in a bucket map join.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Also, note that by default Hive does not support a bucket map join. So, we need to set the following property \u00a0to true for the query to work as this join:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">set hive.optimize.bucketmapjoin = true<\/span><\/p>\n<h2><span style=\"font-weight: 400\">How does it work in Hive?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, Join is done in Mapper only. \u00a0However, let\u2019s understand it in this way, the mapper processing bucket 1 for table A will only fetch bucket 1 of table B. <\/span><\/p>\n<h2><span style=\"font-weight: 400\">Use Case of Bucket Map Join<\/span><\/h2>\n<p><span style=\"font-weight: 400\">To be more specific we use this feature with several scenarios. Like:<\/span><\/p>\n<p><span style=\"font-weight: 400\">i. While all the tables are large.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">ii. Also, while all tables bucketed using the join columns.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">iii. Moreover, while the number of buckets in one table is a multiple of the number of buckets in the other table.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">iii. Also, when all the tables do not sort. <\/span><\/p>\n<h2><span style=\"font-weight: 400\">Disadvantages of Bucket Map Join in Hive<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The major disadvantage of using Bucket Map Join is, here tables need to be bucketed in the same way how the SQL joins. That implies we can not use it for other types of SQLs.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Tips on <\/span><span style=\"font-weight: 400\">Bucket Map Join\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">i. At first, it is very important that the tables are created bucketed on the same join columns. Also, it is important to bucket data while inserting.<\/span><br \/>\n<span style=\"font-weight: 400\">However, one of the ways is to set &#8220;hive.enforce.bucketing=true&#8221; before inserting data.<\/span><br \/>\n<span style=\"font-weight: 400\">For example:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">create table b1(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string)<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">clustered by (col0) into 32 buckets;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">create table b2(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string)<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">clustered by (col0) into 8 buckets;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">set hive.enforce.bucketing = true; <\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">From passwords insert OVERWRITE \u00a0table b1 select * limit 10000;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">From passwords insert OVERWRITE \u00a0table b2 select * limit 10000;<\/span><br \/>\n<span style=\"font-weight: 400\">ii. Also, it is must to set hive.optimize.bucketmapjoin to true.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">set hive.optimize.bucketmapjoin=true;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">select \/*+ MAPJOIN(b2) *\/ b1.* from b1,b2 where b1.col0=b2.col0;<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Conclusion<\/span><\/h2>\n<p><span style=\"font-weight: 400\">As a result, we have seen the complete content regarding Apache Hive Bucket Map Join feature,\u00a0Bucket Map Join example, use cases, Working, and Disadvantages of Bucket Map Join. In next article, we will see <strong>Skew\u00a0Join in Hive<\/strong>. Although, if any query arises, please ask in a comment section.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the last article, we discuss Map Side Join in\u00a0Hive. Basically, while the tables are large and all the tables used in the join are bucketed on the join columns we use a Bucket&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":10796,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[2186,2187,3941,5682,5840,7010,14745,15190,15652],"class_list":["post-10720","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hive","tag-bucket-map-join","tag-bucket-map-join-in-hive","tag-disadvantages-of-bucket-map-join","tag-hive-bucket-map-join","tag-how-bucket-map-join-works","tag-introduction-to-bucket-map-join","tag-tips-on-bucket-map-join","tag-use-case-of-bucket-map-join","tag-what-is-bucket-map-join"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Bucket Map Join in Hive - Tips &amp; Working - DataFlair<\/title>\n<meta name=\"description\" content=\"What is Bucket Map Join in Hive,How Hive Bucket Map Join Works,Use Case of Bucket Map Join,Disadvantages of Bucket Map Join in Hive,Bucket Map Join example\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/bucket-map-join\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bucket Map Join in Hive - Tips &amp; Working - DataFlair\" \/>\n<meta property=\"og:description\" content=\"What is Bucket Map Join in Hive,How Hive Bucket Map Join Works,Use Case of Bucket Map Join,Disadvantages of Bucket Map Join in Hive,Bucket Map Join example\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/bucket-map-join\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-03-14T00:00:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Bucket Map Join in Hive - Tips &amp; Working - DataFlair","description":"What is Bucket Map Join in Hive,How Hive Bucket Map Join Works,Use Case of Bucket Map Join,Disadvantages of Bucket Map Join in Hive,Bucket Map Join example","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/","og_locale":"en_US","og_type":"article","og_title":"Bucket Map Join in Hive - Tips &amp; Working - DataFlair","og_description":"What is Bucket Map Join in Hive,How Hive Bucket Map Join Works,Use Case of Bucket Map Join,Disadvantages of Bucket Map Join in Hive,Bucket Map Join example","og_url":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-03-14T00:00:05+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"Bucket Map Join in Hive &#8211; Tips &amp; Working","datePublished":"2018-03-14T00:00:05+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/"},"wordCount":630,"commentCount":3,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg","keywords":["Bucket Map Join","Bucket Map Join in Hive","Disadvantages of Bucket Map Join","Hive Bucket Map Join","How Bucket Map Join Works","Introduction to Bucket Map Join","Tips on Bucket Map Join","Use Case of Bucket Map Join","What is Bucket Map Join"],"articleSection":["Hive Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/bucket-map-join\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/","url":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/","name":"Bucket Map Join in Hive - Tips &amp; Working - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg","datePublished":"2018-03-14T00:00:05+00:00","description":"What is Bucket Map Join in Hive,How Hive Bucket Map Join Works,Use Case of Bucket Map Join,Disadvantages of Bucket Map Join in Hive,Bucket Map Join example","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/bucket-map-join\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Bucket-Map-Join-in-Hive-01-1.jpg","width":1200,"height":628,"caption":"What is Bucket Map Join in Hive"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/bucket-map-join\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Hive Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hive\/"},{"@type":"ListItem","position":3,"name":"Bucket Map Join in Hive &#8211; Tips &amp; Working"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/10720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=10720"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/10720\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/10796"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=10720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=10720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=10720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}