

{"id":10581,"date":"2018-03-14T00:00:27","date_gmt":"2018-03-14T00:00:27","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=10581"},"modified":"2018-03-14T00:00:27","modified_gmt":"2018-03-14T00:00:27","slug":"bucketing-in-hive","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/","title":{"rendered":"Bucketing in Hive &#8211; Creation of Bucketed Table in Hive"},"content":{"rendered":"<p><span style=\"font-weight: 400\">In <strong>Apache<\/strong> <strong>Hive<\/strong>, for decomposing table data sets into more manageable parts, it uses Hive Bucketing concept. However, there are much more to learn about Bucketing in Hive. <\/span><\/p>\n<p><span style=\"font-weight: 400\">So, in this article, we will cover the whole concept of Bucketing in Hive. <\/span><span style=\"font-weight: 400\">It includes one of the major questions, that why even we need Bucketing in Hive after Hive Partitioning Concept. <\/span><\/p>\n<p>At last, we will discuss\u00a0Features of Bucketing in Hive, Advantages of Bucketing in Hive,\u00a0Limitations of Bucketing in Hive, Example Use Case of Bucketing in Hive with some Hive Bucketing with examples.<\/p>\n<h2><span style=\"font-weight: 400\">What is Bucketing in Hive\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, for decomposing table data sets into more manageable parts, Apache Hive offers another technique. That technique is what we call Bucketing in Hive.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Why Bucketing?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, the concept of <strong>Hive Partitioning<\/strong> provides a way of segregating hive table data into multiple files\/directories. However, it only gives effective results in few scenarios. Such as:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">&#8211; When there is the limited number of partitions.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">&#8211; Or, while partitions are of comparatively equal size.<\/span><br \/>\n<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Although, it is not possible in all scenarios.<\/span> For example when are partitioning our tables based geographic locations like country. Hence, some bigger countries will have large partitions (ex: 4-5 countries itself contributing 70-80% of total data).<\/p>\n<p>While small countries data will create small partitions (remaining all countries in the world may contribute to just 20-30 % of total data). Hence, at that time Partitioning will not be ideal.<\/p>\n<p><span style=\"font-weight: 400\">Then, to solve that problem of over partitioning, Hive offers Bucketing concept. It is another effective technique for decomposing table data sets into more manageable parts.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Features of Bucketing in Hive<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, this concept is based on hashing function on the bucketed column. Along with mod (by the total number of buckets). <\/span><\/p>\n<p><span style=\"font-weight: 400\">i. Where the hash_function depends on the type of the bucketing column.<\/span><br \/>\n<span style=\"font-weight: 400\">ii. However, the Records with the same bucketed column will always be stored in the same bucket.<\/span><br \/>\n<span style=\"font-weight: 400\">iii. Moreover, \u00a0to divide the table into buckets we use CLUSTERED BY clause.<\/span><br \/>\n<span style=\"font-weight: 400\">iv. Generally, in the table directory, each bucket is just a file, and Bucket numbering is 1-based.<\/span><br \/>\n<span style=\"font-weight: 400\">v. Along with Partitioning on Hive tables bucketing can be done and even without partitioning.<\/span><br \/>\n<span style=\"font-weight: 400\">vi. Moreover, Bucketed tables will create almost equally distributed data file parts.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Advantages of Bucketing in Hive<\/span><\/h2>\n<p><span style=\"font-weight: 400\">i. On comparing with non-bucketed tables, Bucketed tables offer the efficient sampling.<\/span><br \/>\n<span style=\"font-weight: 400\">ii. Map-side joins will be faster on bucketed tables than non-bucketed tables, as the data files are equal sized parts.<\/span><br \/>\niii. Here also bucketed tables offer faster query responses than non-bucketed tables as compared to \u00a0Similar to partitioning.<br \/>\n<span style=\"font-weight: 400\">iv. This concept offers the flexibility to keep the records in each bucket to be sorted by one or more columns.<\/span><br \/>\n<span style=\"font-family: Verdana, Geneva, sans-serif\">v. Since the join of each bucket becomes an efficient merge-sort, this makes map-side joins even more efficient.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Limitations of Bucketing in Hive<\/span><\/h2>\n<p>i. However, it doesn\u2019t ensure that the table is properly populated.<br \/>\n<span style=\"font-weight: 400\">ii. So, we need to handle Data Loading into buckets by our-self.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Example Use Case for Bucketing in Hive<\/span><\/h2>\n<p><span style=\"font-weight: 400\">To understand the remaining features of Hive Bucketing let\u2019s see an example Use case, by creating buckets for the sample user records file for testing in this post<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><br \/>\n<span style=\"font-weight: 400\">first_name,last_name, address, country, city, state, post,phone1,phone2, email, web\u00a0<\/span><span style=\"font-weight: 400\">Rebbecca, Didio, 171 E 24th St, AU, Leith, TA, 7315, 03-8174-9123, 0458-665-290, rebbecca.didio@didio.com.au,http:\/\/www.brandtjonathanfesq.com.au<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Hence, let\u2019s create the table partitioned by country and bucketed by state and sorted in ascending order of cities.<\/span><\/p>\n<div id=\"attachment_10747\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10747\" class=\"wp-image-10747 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01.jpg\" alt=\"Example for Hive Bucketing\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Example-Use-Case-for-Hive-Bucketing-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-10747\" class=\"wp-caption-text\">Example for Hive Bucketing<\/p><\/div>\n<h3>a. Creation of Bucketed Tables<\/h3>\n<p><span style=\"font-weight: 400\">However, with the help of CLUSTERED BY clause and optional SORTED BY clause in CREATE TABLE statement we can create bucketed tables. Moreover, we can create a bucketed_user table with above-given requirement with the help of the below HiveQL.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">CREATE TABLE bucketed_user(<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0firstname VARCHAR(64),<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">lastname \u00a0VARCHAR(64),<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">address \u00a0\u00a0STRING,<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">city <\/span> <span style=\"font-weight: 400\"> \u00a0VARCHAR(64),<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0state \u00a0VARCHAR(64),<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">post \u00a0\u00a0\u00a0\u00a0\u00a0STRING,<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">phone1 \u00a0\u00a0\u00a0VARCHAR(64),<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">phone2 \u00a0\u00a0\u00a0STRING,<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">email \u00a0\u00a0\u00a0\u00a0STRING,<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">web \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0STRING<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">)<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0COMMENT &#8216;A bucketed sorted user table&#8217;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">PARTITIONED BY (country VARCHAR(64))<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CLUSTERED BY (state) SORTED BY (city) INTO 32 BUCKETS<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span> <span style=\"font-weight: 400\">STORED AS SEQUENCEFILE;<\/span><a href=\"https:\/\/data-flair.training\/blogs\/hbase-vs-hive\/\"><strong><br \/>\n<\/strong><\/a> <span style=\"font-weight: 400\"><br \/>\n<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">CREATE TABLE bucketed_user(\n       firstname VARCHAR(64),\n        lastname  VARCHAR(64),\n        address   STRING,\n        city  VARCHAR(64),\n       state  VARCHAR(64),\n        post      STRING,\n        phone1    VARCHAR(64),\n        phone2    STRING,\n        email     STRING,\n        web       STRING\n        )\n       COMMENT 'A bucketed sorted user table'\n        PARTITIONED BY (country VARCHAR(64))\n       CLUSTERED BY (state) SORTED BY (city) INTO 32 BUCKETS\n        STORED AS SEQUENCEFILE;<\/pre>\n<p><span style=\"font-weight: 400\">As shown in above code for state and city columns Bucketed columns are included in the table definition, Unlike partitioned columns.<\/span> Especially, which are not included in table columns definition.<\/p>\n<h3>b. Inserting data Into Bucketed Tables<\/h3>\n<p><span style=\"font-weight: 400\">However, we can not directly load bucketed tables with LOAD DATA (LOCAL) INPATH command, similar to partitioned tables.<\/span> Instead to populate the bucketed tables we need to use INSERT OVERWRITE TABLE \u2026 SELECT \u2026FROM clause from another table.<\/p>\n<p>Hence, we will create one temporary table in hive with all the columns in input file from that table we will copy into our target bucketed table for this.<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Moreover, let\u2019s suppose we have created the temp_user temporary table.<\/span> Further, for populating the bucketed table with the temp_user table below is the HiveQL.<\/p>\n<p><span style=\"font-weight: 400\">In addition, we need to set the property hive.enforce.bucketing = true, so that Hive knows to create the number of buckets declared in the table definition to populate the bucketed table.<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">set hive.enforce.bucketing = true;\n\nINSERT OVERWRITE TABLE bucketed_user PARTITION (country)\n       SELECT firstname,\n        lastname,\n        address ,\n        city,\n       state,\n        post,\n        phone1,\n        phone2,\n        email,\n        web,\n        country   \n        FROM temp_user;\n\nset hive.enforce.bucketing = true;\n\nINSERT OVERWRITE TABLE bucketed_user PARTITION (country)\n       SELECT firstname,\n        lastname,\n        address,\n        city,\n       state,\n        post,\n        phone1,\n        phone2,\n        email,\n        web,\n        country   \n        FROM temp_user;<\/pre>\n<p><b>Some points are important to Note:<\/b><\/p>\n<p><span style=\"font-weight: 400\">i. However, in partitioning the property hive.enforce.bucketing = true is similar to hive.exec.dynamic.partition=true property. So, we can enable dynamic bucketing while loading data into hive table By setting this property.<\/span><\/p>\n<p><span style=\"font-weight: 400\">ii. Moreover, it will automatically set the number of reduce tasks to be equal to the number of buckets mentioned in the table definition (for example 32 in our case). Further, it automatically selects the clustered by column from table definition.<\/span><\/p>\n<p><span style=\"font-weight: 400\">iii. Also, we have to manually convey the same information to Hive that, number of reduce tasks to be run (for example in our case, by using set mapred.reduce.tasks=32) and CLUSTER BY (state) and SORT BY (city) clause in the above INSERT \u2026Statement at the end since we do not set this property in Hive Session.<\/span><\/p>\n<h3>c. Solution For Example Use Case<\/h3>\n<p><span style=\"font-weight: 400\">Along with script required for temporary hive table creation, Below is the combined HiveQL. However, \u00a0let\u2019s save this HiveQL into bucketed_user_creation.hql. Also, save the input file provided for example use case section into the user_table.txt file in home directory.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">set hive.exec.dynamic.partition=true;\nset hive.exec.dynamic.partition.mode=nonstrict;\nset hive.exec.max.dynamic.partitions.pernode=1000;\nset hive.enforce.bucketing = true;\n\nDROP TABLE IF EXISTS bucketed_user;\n\nCREATE TEMPORARY TABLE temp_user(\n      firstname VARCHAR(64),\n       lastname  VARCHAR(64),\n       address   STRING,\n       country   VARCHAR(64),\n       city      VARCHAR(64),\n       state     VARCHAR(64),\n       post      STRING,\n       phone1    VARCHAR(64),\n       phone2    STRING,\n       email     STRING,\n       web       STRING\n       )\n       ROW FORMAT DELIMITED\n       FIELDS TERMINATED BY ','\n       LINES TERMINATED BY '\\n'\n      STORED AS TEXTFILE;\n\nLOAD DATA LOCAL INPATH '\/home\/user\/user_table.txt' INTO TABLE temp_user;\n\nCREATE TABLE bucketed_user(\n       firstname VARCHAR(64),\n       lastname  VARCHAR(64),\n       address   STRING,\n       city     VARCHAR(64),\n      state   VARCHAR(64),\n       post      STRING,\n       phone1    VARCHAR(64),\n       phone2    STRING,\n       email     STRING,\n       web       STRING\n       )\n      COMMENT 'A bucketed sorted user table'\n       PARTITIONED BY (country VARCHAR(64))\n      CLUSTERED BY (state) SORTED BY (city) INTO 32 BUCKETS\n       STORED AS SEQUENCEFILE;<\/pre>\n<p>&nbsp;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">set hive.enforce.bucketing = true;\nINSERT OVERWRITE TABLE bucketed_user PARTITION (country)\n      SELECT firstname ,\n               lastname ,\n               address,\n              city,\n              state,\n               post,\n               phone1,\n               phone2,\n               email,\n               web,\n               country   \n        FROM temp_user;\n\nset hive.exec.dynamic.partition=true;\nset hive.exec.dynamic.partition.mode=nonstrict;\nset hive.exec.max.dynamic.partitions.pernode=1000;\nset hive.enforce.bucketing = true;\n\nDROP TABLE IF EXISTS bucketed_user;\n\nCREATE TEMPORARY TABLE temp_user(\n      firstname VARCHAR(64),\n       lastname  VARCHAR(64),\n       address   STRING,\n       country   VARCHAR(64),\n       city      VARCHAR(64),\n       state     VARCHAR(64),\n       post      STRING,\n       phone1    VARCHAR(64),\n       phone2    STRING,\n       email     STRING,\n       web       STRING\n       )\n       ROW FORMAT DELIMITED\n       FIELDS TERMINATED BY ','\n       LINES TERMINATED BY '\\n'\n      STORED AS TEXTFILE;\n\nLOAD DATA LOCAL INPATH '\/home\/user\/user_table.txt' INTO TABLE temp_user;\n\nCREATE TABLE bucketed_user(\n       firstname VARCHAR(64),\n       lastname  VARCHAR(64),\n       address   STRING,\n       city     VARCHAR(64),\n      state   VARCHAR(64),\n       post      STRING,\n       phone1    VARCHAR(64),\n       phone2    STRING,\n       email     STRING,\n       web       STRING\n       )\n      COMMENT 'A bucketed sorted user table'\n       PARTITIONED BY (country VARCHAR(64))\n      CLUSTERED BY (state) SORTED BY (city) INTO 32 BUCKETS\n       STORED AS SEQUENCEFILE;\n\nset hive.enforce.bucketing = true;\nINSERT OVERWRITE TABLE bucketed_user PARTITION (country)\n      SELECT firstname,\n               lastname,\n               address,\n              city,\n              state,\n               post,\n               phone1,\n               phone2,\n               email,\n               webweb,\n              country   \n        FROM temp_user;<\/pre>\n<h3>d.\u00a0Output<\/h3>\n<p><span style=\"font-weight: 400\">Moreover, in hive lets execute this script. Also, see the output of the above script execution below.<\/span><a href=\"https:\/\/data-flair.training\/blogs\/best-hive-books\/\"><strong><br \/>\n<\/strong><\/a><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">user@tri03ws-386:~$ hive -f bucketed_user_creation.hql<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Logging initialized using configuration in jar:file:\/home\/user\/bigdata\/apache-hive-0.14.0-bin\/lib\/hive-common-0.14.0.jar!\/hive-log4j.properties<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 12.144 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.146 seco<\/span><span style=\"font-weight: 400\">nds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Loading data to table default.temp_user<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Table default.temp_user stats: [numFiles=1, totalSize=283212]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.21 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.5 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Query ID = user_20141222163030_3f024f2b-e682-4b08-b25c-7775d7af4134<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Total jobs = 1<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Launching Job 1 out of 1<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Number of reduce tasks determined at compile time: 32<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to change the average load for a reducer (in bytes):<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set hive.exec.reducers.bytes.per.reducer=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to limit the maximum number of reducers:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set hive.exec.reducers.max=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to set a constant number of reducers:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set mapreduce.job.reduces=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Starting Job = job_1419243806076_0002, Tracking URL = http:\/\/tri03ws-<\/span><br \/>\n<span style=\"font-weight: 400\">386:8088\/proxy\/application_1419243806076_0002\/<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Kill Command = \/home\/user\/bigdata\/hadoop-2.6.0\/bin\/hadoop job \u00a0-kill job_1419243806076_0002<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 32<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:30:36,164 Stage-1 map = 0%, \u00a0reduce = 0%<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:31:09,770 Stage-1 map = 100%, \u00a0reduce = 0%, Cumulative CPU 1.66 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:10,368 Stage-1 map = 100%, \u00a0reduce = 0%, Cumulative CPU 1.66 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:28,037 Stage-1 map = 100%, \u00a0reduce = 13%, Cumulative CPU 3.19 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:36,480 Stage-1 map = 100%, \u00a0reduce = 14%, Cumulative CPU 7.06 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:40,317 Stage-1 map = 100%, \u00a0reduce = 19%, Cumulative CPU 7.63 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:40,691 Stage-1 map = 100%, \u00a0reduce = 19%, Cumulative CPU 12.28 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:54,846 Stage-1 map = 100%, \u00a0reduce = 31%, Cumulative CPU 17.45 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:58,642 Stage-1 map = 100%, \u00a0reduce = 38%, Cumulative CPU 21.69 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:34:52,731 Stage-1 map = 100%, \u00a0reduce = 56%, Cumulative CPU 32.01 sec<\/span><br \/>\n<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:21,369 Stage-1 map = 100%, \u00a0reduce = 63%, Cumulative CPU 35.08 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:22,493 Stage-1 map = 100%, \u00a0reduce = 75%, Cumulative CPU 41.45 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:53,559 Stage-1 map = 100%, \u00a0reduce = 94%, Cumulative CPU 51.14 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:36:14,301 Stage-1 map = 100%, \u00a0reduce = 100%, Cumulative CPU 54.13 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">MapReduce Total cumulative CPU time: 54 seconds 130 msec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Ended Job = job_1419243806076_0002<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Loading data to table default.bucketed_user partition (country=null)<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\"> Time taken for load dynamic partitions : 2421<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=AU}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=country}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=US}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=UK}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=CA}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\"> Time taken for adding to write entity : 17<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=AU} stats: [numFiles=32, numRows=500, totalSize=78268, rawDataSize=67936]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=CA} stats: [numFiles=32, numRows=500, totalSize=76564, rawDataSize=66278]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=UK} stats: [numFiles=32, numRows=500, totalSize=85604, rawDataSize=75292]<\/span><br \/>\n<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=US} stats: [numFiles=32, numRows=500, totalSize=75468, rawDataSize=65383]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=country} stats: [numFiles=32, numRows=1, totalSize=2865, rawDataSize=68]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">MapReduce Jobs Launched: <\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Stage-Stage-1: Map: 1 \u00a0Reduce: 32 Cumulative CPU: 54.13 sec \u00a0\u00a0HDFS Read: 283505 HDFS Write: 316247 SUCCESS<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Total MapReduce CPU Time Spent: 54 seconds 130 msec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 396.486 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">user@tri03ws-386:~$<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">user@tri03ws-386:~$ hive -f bucketed_user_creation.hql<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Logging initialized using configuration in jar:file:\/home\/user\/bigdata\/apache-hive-0.14.0-bin\/lib\/hive-common-0.14.0.jar!\/hive-log4j.properties<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 12.144 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.146 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Loading data to table default.temp_user<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Table default.temp_user stats: [numFiles=1, totalSize=283212]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.21 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 0.5 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Query ID = user_20141222163030_3f024f2b-e682-4b08-b25c-7775d7af4134<\/span><\/p>\n<p><span style=\"font-weight: 400\">Total jobs = 1<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Launching Job 1 out of 1<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Number of reduce tasks determined at compile time: 32<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to change the average load for a reducer (in bytes):<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set hive.exec.reducers.bytes.per.reducer=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to limit the maximum number of reducers:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set hive.exec.reducers.max=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">In order to set a constant number of reducers:<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\"> \u00a0set mapreduce.job.reduces=&lt;number&gt;<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Starting Job = job_1419243806076_0002, Tracking URL = http:\/\/tri03ws-386:8088\/proxy\/application_1419243806076_0002\/<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Kill Command = \/home\/user\/bigdata\/hadoop-2.6.0\/bin\/hadoop job \u00a0-kill job_1419243806076_0002<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 32<\/span><br \/>\n<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:30:36,164 Stage-1 map = 0%, \u00a0reduce = 0%<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:31:09,770 Stage-1 map = 100%, \u00a0reduce = 0%, Cumulative CPU 1.66 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:10,368 Stage-1 map = 100%, \u00a0reduce = 0%, Cumulative CPU 1.66 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:28,037 Stage-1 map = 100%, \u00a0reduce = 13%, Cumulative CPU 3.19 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:36,480 Stage-1 map = 100%, \u00a0reduce = 14%, Cumulative CPU 7.06 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:32:40,317 Stage-1 map = 100%, \u00a0reduce = 19%, Cumulative CPU 7.63 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:40,691 Stage-1 map = 100%, \u00a0reduce = 19%, Cumulative CPU 12.28 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:54,846 Stage-1 map = 100%, \u00a0reduce = 31%, Cumulative CPU 17.45 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:33:58,642 Stage-1 map = 100%, \u00a0reduce = 38%, Cumulative CPU 21.69 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:34:52,731 Stage-1 map = 100%, \u00a0reduce = 56%, Cumulative CPU 32.01 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:21,369 Stage-1 map = 100%, \u00a0reduce = 63%, Cumulative CPU 35.08 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:22,493 Stage-1 map = 100%, \u00a0reduce = 75%, Cumulative CPU 41.45 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:35:53,559 Stage-1 map = 100%, \u00a0reduce = 94%, Cumulative CPU 51.14 sec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">2014-12-22 16:36:14,301 Stage-1 map = 100%, \u00a0reduce = 100%, Cumulative CPU 54.13 sec<\/span><br \/>\n<span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">MapReduce Total cumulative CPU time: 54 seconds 130 msec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Ended Job = job_1419243806076_0002<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Loading data to table default.bucketed_user partition (country=null)<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\"> Time taken for load dynamic partitions : 2421<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=AU}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=country}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=US}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=UK}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\">Loading partition {country=CA}<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span> <span style=\"font-weight: 400\"> Time taken for adding to write entity : 17<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=AU} stats: [numFiles=32, numRows=500, totalSize=78268, rawDataSize=67936]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=CA} stats: [numFiles=32, numRows=500, totalSize=76564, rawDataSize=66278]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=UK} stats: [numFiles=32, numRows=500, totalSize=85604, rawDataSize=75292]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=US} stats: [numFiles=32, numRows=500, totalSize=75468, rawDataSize=65383]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Partition default.bucketed_user{country=country} stats: [numFiles=32, numRows=1, totalSize=2865, rawDataSize=68]<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">MapReduce Jobs Launched: <\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Stage-Stage-1: Map: 1 \u00a0Reduce: 32 Cumulative CPU: 54.13 sec \u00a0\u00a0HDFS Read: 283505 HDFS Write: 316247 SUCCESS<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Total MapReduce CPU Time Spent: 54 seconds 130 msec<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">OK<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">Time taken: 396.486 seconds<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><span style=\"font-weight: 400\">user@tri03ws-386:~$<\/span><br \/>\n<span style=\"font-weight: 400\">Hence, we have seen that MapReduce job initiated 32 reduce tasks for 32 buckets and four partitions are created by country in the above box.<\/span><\/p>\n<h2>Conclusion<\/h2>\n<p>As a result, we have seen the whole concept of Hive Bucketing. Also, it includes why even we need Hive Bucketing after Hive Partitioning Concept, Features of Bucketing in Hive, Advantages of Bucketing in Hive,\u00a0Limitations of Bucketing in Hive, And Example Use Case of Bucketing in Hive.<\/p>\n<p>As a result we seen Hive Bucketing Without Partition, how to decide number of buckets in hive, hive bucketing with examples, and hive insert into bucketed table.Still, if any doubt occurred feel free to ask in the comment section.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Apache Hive, for decomposing table data sets into more manageable parts, it uses Hive Bucketing concept. However, there are much more to learn about Bucketing in Hive. So, in this article, we will&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":10744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[326,3168,4599,5681,5684,5685,8250,15751,16134],"class_list":["post-10581","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hive","tag-advantages-of-bucketing-in-hive","tag-creation-of-bucketed-tables","tag-features-of-hive-bucketing","tag-hive-bucket-external-table","tag-hive-bucketing-with-examples","tag-hive-bucketing-without-partition","tag-limitations-of-hive-bucketing","tag-what-is-hive-bucketing","tag-why-bucketing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Bucketing in Hive - Creation of Bucketed Table in Hive - DataFlair<\/title>\n<meta name=\"description\" content=\"What is Bucketing in Hive,Features of Hive Bucketing, Why Bucketing in hive used,Advantages of Bucketing in Hive,Limitations of Hive Bucketing with examples\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bucketing in Hive - Creation of Bucketed Table in Hive - DataFlair\" \/>\n<meta property=\"og:description\" content=\"What is Bucketing in Hive,Features of Hive Bucketing, Why Bucketing in hive used,Advantages of Bucketing in Hive,Limitations of Hive Bucketing with examples\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-03-14T00:00:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Bucketing in Hive - Creation of Bucketed Table in Hive - DataFlair","description":"What is Bucketing in Hive,Features of Hive Bucketing, Why Bucketing in hive used,Advantages of Bucketing in Hive,Limitations of Hive Bucketing with examples","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/","og_locale":"en_US","og_type":"article","og_title":"Bucketing in Hive - Creation of Bucketed Table in Hive - DataFlair","og_description":"What is Bucketing in Hive,Features of Hive Bucketing, Why Bucketing in hive used,Advantages of Bucketing in Hive,Limitations of Hive Bucketing with examples","og_url":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-03-14T00:00:27+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"Bucketing in Hive &#8211; Creation of Bucketed Table in Hive","datePublished":"2018-03-14T00:00:27+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/"},"wordCount":2018,"commentCount":2,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg","keywords":["Advantages of Bucketing in Hive","Creation of Bucketed Tables","Features of Hive Bucketing","hive bucket external table","hive bucketing with examples","hive bucketing without partition","Limitations of Hive Bucketing","what is Hive Bucketing","Why Bucketing"],"articleSection":["Hive Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/","url":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/","name":"Bucketing in Hive - Creation of Bucketed Table in Hive - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg","datePublished":"2018-03-14T00:00:27+00:00","description":"What is Bucketing in Hive,Features of Hive Bucketing, Why Bucketing in hive used,Advantages of Bucketing in Hive,Limitations of Hive Bucketing with examples","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Bucketing-01.jpg","width":1200,"height":628,"caption":"Introduction - Bucketing in Hive"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/bucketing-in-hive\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Hive Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hive\/"},{"@type":"ListItem","position":3,"name":"Bucketing in Hive &#8211; Creation of Bucketed Table in Hive"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/10581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=10581"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/10581\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/10744"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=10581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=10581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=10581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}