{"id":9738,"date":"2018-03-01T11:50:29","date_gmt":"2018-03-01T11:50:29","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=9738"},"modified":"2018-03-01T11:50:29","modified_gmt":"2018-03-01T11:50:29","slug":"hive-tutorial","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/","title":{"rendered":"Apache Hive Tutorial &#8211; A Single Best Comprehensive Guide for Beginner"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Basically, for querying and analyzing large datasets stored in Hadoop files we use <strong>Apache Hive<\/strong>. However, there are many more concepts of Hive, that all we will discuss in this Apache Hive Tutorial, you can learn about what is Apache Hive. <\/span><\/p>\n<p><span style=\"font-weight: 400\">So, in this Apache Hive Tutorial, we will learn Hive history. Further, we will see why the Hive is used &#8211; reasons to learn Hive. Also, we will cover the Hive architecture or components to understand well. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Afterwards, we will also cover its limitations, how does Hive work, Hive vs SparkSQL, and Pig vs Hive vs Hadoop MapReduce.<\/span><\/p>\n<p>So, let&#8217;s start Hive Tutorial.<\/p>\n<h2>What is Apache Hive?<\/h2>\n<p><span style=\"font-weight: 400\"><em>Apache Hive is an open source data warehouse system built on top of Hadoop Haused<\/em>. Especially, we use it for querying and analyzing large datasets stored in Hadoop files. Moreover, by using Hive we can process structured and semi-structured data in\u00a0<strong>Hadoop<\/strong>.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In other words, it is a<em> data warehouse infrastructure which facilitates querying and managing large datasets which reside in the distributed storage system<\/em>. Basically, it offers a way to query the data using a SQL-like query language called<strong> HiveQL(Hive Query Language)<\/strong>.<\/span><\/p>\n<p>In addition, a compiler translates HiveQL statements into <strong>MapReduce<\/strong> jobs, internally. Further which are submitted to Hadoop framework for execution.<\/p>\n<p><strong>a. Hive is not<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Sometimes, few misconceptions occur about Hive. So, let\u2019s clarify that:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\"> We can say it is not a relational database<\/span><\/li>\n<li><span style=\"font-weight: 400\"> Also, not a design for OnLine Transaction Processing (OLTP)<\/span><\/li>\n<li>Even not a language for real-time queries and row-level update<\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Why Hive?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In this section of Hive tutorial, we discuss &#8211; Why should we use Apache Hive technology?<\/span><\/p>\n<p><span style=\"font-weight: 400\">As we know it is mainly used for data querying, analysis, and summarization. Moreover, it helps to improve the developer productivity. However, that comes at the cost of increasing latency and decreasing efficiency. <\/span><\/p>\n<p><span style=\"font-weight: 400\">In other words, Hive is a variant of SQL and a very good one indeed. Although, when compared to SQL systems implemented in databases, Hive stands tall. Hive has many User Defined Functions that makes it easy to contribute to the UDFs. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Also, we can connect Hive queries to various Hadoop packages. Such as RHive, RHipe, and even Apache Mahout. However, when working for complex analytical processing and data formats that are challenging, it greatly helps the developer community.<\/span><\/p>\n<p><span style=\"font-weight: 400\">To be more specific, \u2018Data warehouse\u2019 means a system we use for reporting and data analysis. Basically, it refers to inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information and suggesting conclusions. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, in the different business, science, and social science domains data analysis has multiple aspects and approaches, encompassing diverse techniques under a variety of names.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, it allows users to simultaneously access the data and increases the response time. It means the time a system or functional unit takes to react to a given input. Also, it has a much faster response time than most other types of queries on the same type of huge datasets. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, without any drop in performance, it is highly flexible as more commodities can easily be added in response to more adding of the cluster of data.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Hive Tutorial &#8211; History\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Basically, at Facebook, Data Infrastructure Team developed Hive. Especially, to address the requirements at Facebook, they use Hive technology. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Internally, it is very popular with all the users on Facebook. To be very specific, for a wide variety of applications it is being used to run thousands of jobs on the cluster with hundreds of users.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, Hive-Hadoop cluster stores more than 2PB of raw data at Facebook. Moreover, on a daily basis, it loads 15 TB of data regularly.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Also, very important to know that it is being used and developed by a number of companies. Such as \u00a0Amazon, IBM, Yahoo, Netflix, Financial Industry Regulatory Authority (FINRA) etc.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Hive Architecture<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In below diagram Hive tutorial states Hive architecture with its components:<\/span><\/p>\n<div id=\"attachment_9741\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9741\" class=\"wp-image-9741 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01.jpg\" alt=\"Apache Hive\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/USER-INTERFACES-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-9741\" class=\"wp-caption-text\">Hive Tutorial &#8211; Hive Architecture<\/p><\/div>\n<p>There are several different units in this component diagram.\u00a0Now, let&#8217;s describes each unit:<\/p>\n<p><strong>a. User Interface<\/strong><\/p>\n<p>As we know it is a data warehouse infrastructure software. It can create interaction between user and <strong>HDFS<\/strong>. Moreover, there are various user interfaces that Hive supports. They are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server).<\/p>\n<p><strong>b. Meta Store<\/strong><\/p>\n<p>Basically, to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping, it chooses respective database servers.<\/p>\n<p><strong>c. HiveQL Process Engine<\/strong><\/p>\n<p>Also, we can say HiveQL is same as SQL. Especially, for querying on schema info on the Metastore. Also, for MapReduce program, it is one of the replacements of the traditional approach. Moreover, we can write a query for MapReduce job and process it, instead of writing MapReduce program in Java.<\/p>\n<p><strong>d. Execution Engine<\/strong><\/p>\n<p>Although, Hive Execution Engine is the conjunction part of HiveQL process Engine and MapReduce. Execution engine processes the query and generates results as same as MapReduce results. Also, it uses the flavor of MapReduce.<\/p>\n<p><strong>e. HDFS or HBase<\/strong><\/p>\n<p>Basically, to store data into file system Hadoop distributed file system or <strong>HBase<\/strong> is the data storage techniques.<\/p>\n<h2>How Does Hive Works?<\/h2>\n<p><span style=\"font-weight: 400\">Hive Tutorial &#8211; the following diagram depicts the workflow between Hive and Hadoop.<\/span><\/p>\n<div id=\"attachment_9742\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9742\" class=\"wp-image-9742 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop.png\" alt=\"Apache Hive\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop.png 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop-150x79.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop-300x157.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop-768x402.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-Hadoop-1024x536.png 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-9742\" class=\"wp-caption-text\">Apache Hive Tutorial &#8211; Working of Hive<\/p><\/div>\n<p>The following table defines how Hive interacts with Hadoop framework.<\/p>\n<p><strong>Step-1 Execute Query<\/strong><\/p>\n<p>At very first, the Hive interface ( Command Line or Web UI) sends the query to Driver (any database driver such as JDBC, ODBC, etc.) to execute.<\/p>\n<p><strong>Step-2 Get Plan<\/strong><\/p>\n<p>Afterwards, the driver takes the help of query compiler which parses the query to check the syntax and query plan or the requirement of the query.<\/p>\n<p><strong>Step-3\u00a0 Get Metadata<\/strong><\/p>\n<p>Further, the compiler sends metadata request to Metastore (any database).<\/p>\n<p><strong>Step-4 Send Metadata<\/strong><\/p>\n<p>After that Metastore sends metadata as a response to the compiler.<\/p>\n<p><strong>Step-5 Send Plan<\/strong><\/p>\n<p>Then the compiler checks the requirement and resends the plan to the driver. However, the parsing and compiling of a query are complete, Up to here.<\/p>\n<p><strong>Step-6 Execute Plan<\/strong><\/p>\n<p>Further, the driver sends the execution plan to the execution engine.<\/p>\n<p><strong>Step-7 Execute Job<\/strong><\/p>\n<p>Then, the process of execution job is a MapReduce job, internally. Also, the execution engine sends the job to JobTracker, which is in name node and it assigns this job to TaskTracker, which is in data node. Moreover, the query executes MapReduce job, here.<\/p>\n<ul>\n<li><strong>Metadata Ops<\/strong><\/li>\n<\/ul>\n<p>During the execution, the execution engine can execute metadata operations with Metastore.<\/p>\n<p><strong>Step-8 Fetch Result<\/strong><\/p>\n<p>While execution is over, the execution engine receives the results from Data nodes.<\/p>\n<p><strong>Step-9 Send Results<\/strong><\/p>\n<p>After fetching results, execution engine sends those resultant values to the driver.<\/p>\n<p><strong>Step-10 Send Results<\/strong><\/p>\n<p>At last, the driver sends the results to Hive interfaces.<\/p>\n<h2><span style=\"font-weight: 400\">Features of Hive<\/span><\/h2>\n<p>In this section of Hive Tutorial, we study Apache Hive features. So, let\u2019s discuss all-<\/p>\n<ul>\n<li><span style=\"font-weight: 400\">The best feature is it offers data summarization, query, and analysis in much easier manner.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> However, to process data without actually storing in <strong>HDFS<\/strong>, Hive supports external tables.<\/span><\/li>\n<li><span style=\"font-weight: 400\">Moreover, it fits the low-level interface requirement of Hadoop perfectly.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> Also, to improve performance it supports partitioning of data at the level of tables.<\/span><\/li>\n<li><span style=\"font-weight: 400\">While it comes to optimizing logical plans, Hive has a rule-based optimizer available.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> Hive is scalable, familiar, and extensible in nature.<\/span><\/li>\n<li><span style=\"font-weight: 400\">For working with HiveQL Knowledge of basic SQL query is enough. We don\u2019t need any knowledge of programming language.<\/span><\/li>\n<li><span style=\"font-weight: 400\">By using Hive, it is possible to process structured data in Hadoop.<\/span><\/li>\n<li>Hive makes Querying very simple, as same as SQL.<\/li>\n<li><span style=\"font-weight: 400\"> By using Hive, it is possible to run Ad-hoc queries for the data analysis<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Limitation of Hive<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Apache Hive Tutorial discuss this following limitation of Hive. Let&#8217;s discuss all &#8211;<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\"> We can not perform real-time queries with Hive. Also, it does not offer row-level updates.<\/span><\/li>\n<li><span style=\"font-weight: 400\"> Moreover, \u00a0for interactive data browsing Hive offers acceptable latency.<\/span><\/li>\n<li>Also, we can say Hive is not the right choice for online transaction processing.<\/li>\n<li>While it comes to latency, for Hive queries latency is generally very high.<\/li>\n<\/ul>\n<h2>Apache Hive Tutorial &#8211; Usage<\/h2>\n<p>Here, we will look at following Hive usages.<\/p>\n<ul>\n<li>We use Hive for Schema flexibility as well as evolution.<\/li>\n<li>Moreover, it is possible to portion and bucket, tables in Apache Hive.<\/li>\n<li>Also, we can use JDBC\/ODBC drivers, since they are available in Hive.<\/li>\n<\/ul>\n<h2>Hive vs Spark SQL<\/h2>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-9790 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3.jpg\" alt=\"Apache Hive\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Spark-SQL-01-3-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><\/p>\n<p>In this section of Apache Hive tutorial, we will compare\u00a0Hive vs Spark SQL in detail.<br \/>\n<strong>a. Initial release<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>The hive was first released in the year 2012.<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>Whereas Spark SQL was first released in the year 2014.<br \/>\n<strong>b. Current release<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>Currently released on 18 November 2017: version 2.3.2<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>Currently released on 09 October 2017: version 2.1.2<br \/>\n<strong>c. Developer<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>Facebook developed it originally.\u00a0Further donated to the Apache Software Foundation,\u00a0that has maintained it since.<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>Apache Software Foundation developed it originally.<br \/>\n<strong>d. Server operating systems<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>However,\u00a0with a Java VM, it supports all Operating Systems.<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>There are many operating systems Spark SQL supports. For example Linux OS, X, and Windows.<br \/>\n<strong>e. Data Types<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>It attains predefined data types. For example, float or date.<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>Like Spark SQL, it also\u00a0attains predefined data types. For Example, float or date.<br \/>\n<strong>f. Support of SQL<\/strong><\/p>\n<ul>\n<li><strong>Apache Hive<\/strong><\/li>\n<\/ul>\n<p>Basically, it possesses SQL-like DML and DDL statements.<\/p>\n<ul>\n<li><strong>Spark SQL<\/strong><\/li>\n<\/ul>\n<p>As same as Hive, it also possesses SQL-like DML and DDL statements.<\/p>\n<h2>Pig vs Hive vs Hadoop MapReduce<\/h2>\n<div id=\"attachment_9791\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9791\" class=\"wp-image-9791 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01.jpg\" alt=\"Apache Hive\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Hive-vs-Hadoop-MapReduce-vs-Pig-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-9791\" class=\"wp-caption-text\">Apache Hive Tutorial &#8211; Pig vs Hive vs Hadoop MapReduce<\/p><\/div>\n<p>Get a complete differentiation of Pig vs Hive vs Hadoop Mapreduce in this section of Apache Hive tutorial.<\/p>\n<p><strong>a. Language<\/strong><\/p>\n<ul>\n<li><strong>Hive<\/strong><\/li>\n<\/ul>\n<p>It has\u00a0SQL like Query language.<\/p>\n<ul>\n<li><strong>MapReduce<\/strong><\/li>\n<\/ul>\n<p>Also, has compiled language.<\/p>\n<ul>\n<li><strong>Pig<\/strong><\/li>\n<\/ul>\n<p>It has the scripting language.<\/p>\n<p><strong>b. Abstraction<\/strong><\/p>\n<ul>\n<li><strong>Hive<\/strong><\/li>\n<\/ul>\n<p>It has a Low level of Abstraction.<\/p>\n<ul>\n<li><strong>MapReduce<\/strong><\/li>\n<\/ul>\n<p>Also, has the High level of Abstraction.<\/p>\n<ul>\n<li><strong>Pig<\/strong><\/li>\n<\/ul>\n<p>It has the High level of Abstraction.<\/p>\n<p><strong>c. Line of codes<\/strong><\/p>\n<ul>\n<li><strong>Hive<\/strong><\/li>\n<\/ul>\n<p>Comparatively less no. of the line of codes from both MapReduce and Pig.<\/p>\n<ul>\n<li><strong>MapReduce<\/strong><\/li>\n<\/ul>\n<p>It has More line of codes.<\/p>\n<ul>\n<li><strong>Pig<\/strong><\/li>\n<\/ul>\n<p>Comparatively less no. of the line of codes from MapReduce.<\/p>\n<p><strong>d. Development Efforts<\/strong><\/p>\n<ul>\n<li><strong>Hive<\/strong><\/li>\n<\/ul>\n<p>Comparatively fewer development efforts from both MapReduce and Pig.<\/p>\n<ul>\n<li><strong>MapReduce<\/strong><\/li>\n<\/ul>\n<p>More development effort is involved.<\/p>\n<ul>\n<li><strong>Pig<\/strong><\/li>\n<\/ul>\n<p>Comparatively less development effort.<\/p>\n<p><strong>e. Code Efficiency<\/strong><\/p>\n<ul>\n<li><strong>Hive<\/strong><\/li>\n<\/ul>\n<p>Code efficiency is relatively less.<\/p>\n<ul>\n<li><strong>MapReduce<\/strong><\/li>\n<\/ul>\n<p>It has high Code efficiency.<\/p>\n<ul>\n<li><strong>Pig<\/strong><\/li>\n<\/ul>\n<p>Code efficiency is relatively less.<\/p>\n<p>So, this was all in Apache Hive Tutorial. Hope you like our explanation.<\/p>\n<h2><span style=\"font-weight: 400\">Conclusion &#8211; Hive Tutorial<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Hence, in this Apache Hive tutorial, we have seen the concept of Apache Hive. It includes Hive architecture, limitations of Hive, advantages, why Hive is needed, Hive History, Hive vs Spark SQL and Pig vs Hive vs Hadoop MapReduce. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Still, if you have to ask any query about this Apache Hive tutorial, feel free to ask through the comment section.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Basically, for querying and analyzing large datasets stored in Hadoop files we use Apache Hive. However, there are many more concepts of Hive, that all we will discuss in this Apache Hive Tutorial, you&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":10029,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[814,5678,5719,5741,5795,5810,5811,6994,9523,15588,15750,16660],"class_list":["post-9738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hive","tag-apache-hive-tutorial","tag-hive-architecture","tag-hive-history","tag-hive-introduction","tag-hive-tutorial","tag-hive-vs-spark-sql","tag-hive-works","tag-introduction-to-apache-hive","tag-pig-vs-hive-vs-hadoop-mapreduce","tag-what-is-apache-hive","tag-what-is-hive","tag-why-hive-is-used"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Hive Tutorial - A Single Best Comprehensive Guide for Beginner - DataFlair<\/title>\n<meta name=\"description\" content=\"Apache Hive Tutorial-What is Apache Hive, why hives, hive history, hive architecture,hive works,hive vs spark SQL,pig vs hive vs hadoop mapreduce,learn hive\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/hive-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Hive Tutorial - A Single Best Comprehensive Guide for Beginner - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Apache Hive Tutorial-What is Apache Hive, why hives, hive history, hive architecture,hive works,hive vs spark SQL,pig vs hive vs hadoop mapreduce,learn hive\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/hive-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-03-01T11:50:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Hive Tutorial - A Single Best Comprehensive Guide for Beginner - DataFlair","description":"Apache Hive Tutorial-What is Apache Hive, why hives, hive history, hive architecture,hive works,hive vs spark SQL,pig vs hive vs hadoop mapreduce,learn hive","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"Apache Hive Tutorial - A Single Best Comprehensive Guide for Beginner - DataFlair","og_description":"Apache Hive Tutorial-What is Apache Hive, why hives, hive history, hive architecture,hive works,hive vs spark SQL,pig vs hive vs hadoop mapreduce,learn hive","og_url":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-03-01T11:50:29+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"Apache Hive Tutorial &#8211; A Single Best Comprehensive Guide for Beginner","datePublished":"2018-03-01T11:50:29+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/"},"wordCount":1840,"commentCount":2,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg","keywords":["Apache Hive tutorial","hive architecture","hive history","hive introduction","hive tutorial","hive vs spark SQL","hive works","Introduction to Apache Hive","pig vs hive vs hadoop mapreduce","what is Apache Hive","What is Hive","Why Hive is used"],"articleSection":["Hive Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/hive-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/","url":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/","name":"Apache Hive Tutorial - A Single Best Comprehensive Guide for Beginner - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg","datePublished":"2018-03-01T11:50:29+00:00","description":"Apache Hive Tutorial-What is Apache Hive, why hives, hive history, hive architecture,hive works,hive vs spark SQL,pig vs hive vs hadoop mapreduce,learn hive","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/hive-tutorial\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/03\/Introduction-to-Apache-Hive.jpg","width":1200,"height":628,"caption":"Apache Hive Tutorial- Hive introduction"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/hive-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Hive Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hive\/"},{"@type":"ListItem","position":3,"name":"Apache Hive Tutorial &#8211; A Single Best Comprehensive Guide for Beginner"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/9738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=9738"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/9738\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/10029"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=9738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=9738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=9738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}