

{"id":20016,"date":"2018-07-08T04:00:25","date_gmt":"2018-07-08T04:00:25","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=20016"},"modified":"2018-07-08T04:00:25","modified_gmt":"2018-07-08T04:00:25","slug":"hcatalog-and-pig-integration","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/","title":{"rendered":"HCatalog and Pig Integration | Accessing Pig With HCatalog"},"content":{"rendered":"<p><span style=\"font-weight: 400\">In our last <strong>HCatalog tutorial<\/strong>, we discussed<strong> HCatalog loader and storer<\/strong>. Today, we will see HCatalog and <strong>Pig <\/strong>Integration. We can\u00a0easily integrate HCatalog with Pig. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Moreover, we will also see the example of HCatalog and Pig Integration to understand it well.<\/span><\/p>\n<p><span style=\"font-weight: 400\">So, let&#8217;s start HCatalog and Pig Integration.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Running Pig with HCatalog<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Generally, it is not possible\u00a0for Pig\u00a0to pick up HCatalog jars. So, either we can\u00a0use a flag in the <strong>pig command<\/strong> or we can set the environment variables PIG_CLASSPATH and PIG_OPTS,\u00a0 to bring in the necessary jars, such as:<\/span><\/p>\n<h3>a. The -useHCatalog Flag<\/h3>\n<p><span style=\"font-weight: 400\">Hence, for working with HCatalog, simply include the following flag, to bring in the appropriate jars:<\/span><br \/>\n<strong>pig -useHCatalog<\/strong><\/p>\n<h3>b. Jars and Configuration Files<\/h3>\n<p><span style=\"font-weight: 400\">Make sure we need to tell Pig where to find our HCatalog jars and the <strong>Hive<\/strong> jars used by the HCatalog client, for Pig commands that omit -useHCatalog. Hence,\u00a0we need to define the environment variable PIG_CLASSPATH with the appropriate jars, to do this.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, HCatalog can tell\u00a0us the jars it needs. Though,\u00a0it needs to know where Hadoop and Hive are installed, for that. Also, in the PIG_OPTS variable, we need to tell Pig the URI for our metastore.<\/span><br \/>\n<span style=\"font-weight: 400\">Further,\u00a0we can perform following in the case where\u00a0we have<strong> installed Hadoop<\/strong> and Hive via tar:<\/span><\/p>\n<p><strong>export HADOOP_HOME=&lt;path_to_hadoop_install&gt;<\/strong><br \/>\n<strong>export HIVE_HOME=&lt;path_to_hive_install&gt;<\/strong><br \/>\n<strong>export HCAT_HOME=&lt;path_to_hcat_install&gt;<\/strong><br \/>\n<strong>export PIG_CLASSPATH=$HCAT_HOME\/share\/hcatalog\/hcatalog-core*.jar:\\<\/strong><br \/>\n<strong>$HCAT_HOME\/share\/hcatalog\/hcatalog-pig-adapter*.jar:\\<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/hive-metastore-*.jar:$HIVE_HOME\/lib\/libthrift-*.jar:\\<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/hive-exec-*.jar:$HIVE_HOME\/lib\/libfb303-*.jar:\\<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/jdo2-api-*-ec.jar:$HIVE_HOME\/conf:$HADOOP_HOME\/conf:\\<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/slf4j-api-*.jar<\/strong><br \/>\n<strong>export PIG_OPTS=-Dhive.metastore.uris=thrift:\/\/&lt;hostname&gt;:&lt;port&gt;<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Also, we can pass the jars in your command line:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&lt;path_to_pig_install&gt;\/bin\/pig -Dpig.additional.jars=\\\n$HCAT_HOME\/share\/hcatalog\/hcatalog-core*.jar:\\\n$HCAT_HOME\/share\/hcatalog\/hcatalog-pig-adapter*.jar:\\\n$HIVE_HOME\/lib\/hive-metastore-*.jar:$HIVE_HOME\/lib\/libthrift-*.jar:\\\n$HIVE_HOME\/lib\/hive-exec-*.jar:$HIVE_HOME\/lib\/libfb303-*.jar:\\\n$HIVE_HOME\/lib\/jdo2-api-*-ec.jar:$HIVE_HOME\/lib\/slf4j-api-*.jar  &lt;script.pig&gt;<\/pre>\n<p><span style=\"font-weight: 400\">Moreover, in each filepath, the version number found will be substituted for *.\u00a0As an example here release 0.5.0 of HCatalog uses\u00a0following jars and conf files:<\/span><\/p>\n<p><strong>$HCAT_HOME\/share\/hcatalog\/hcatalog-core-0.5.0.jar<\/strong><br \/>\n<strong>$HCAT_HOME\/share\/hcatalog\/hcatalog-pig-adapter-0.5.0.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/hive-metastore-0.10.0.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/libthrift-0.7.0.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/hive-exec-0.10.0.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/libfb303-0.7.0.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/jdo2-api-2.3-ec.jar<\/strong><br \/>\n<strong>$HIVE_HOME\/conf<\/strong><br \/>\n<strong>$HADOOP_HOME\/conf<\/strong><br \/>\n<strong>$HIVE_HOME\/lib\/slf4j-api-1.6.1.jar<\/strong><\/p>\n<h3>c. Authentication<\/h3>\n<p><span style=\"font-weight: 400\">Make sure you have run &#8220;kinit &lt;username&gt;@FOO.COM&#8221; to get a Kerberos ticket and to be able to authenticate to the HCatalog server, if you are using a secure cluster and a failure results in a message like &#8220;2010-11-03 16:17:28,225 WARN <strong>hive.metastore<\/strong> &#8230; &#8211; Unable to connect metastore with URI thrift:\/\/&#8230;&#8221; in \/tmp\/&lt;username&gt;\/hive.log.<\/span><\/p>\n<h2>Example of HCatalog and Pig Integration<\/h2>\n<p>For Example-<br \/>\nNow, let&#8217;s suppose we have a file employee_details.txt in HDFS, its content is:<\/p>\n<p>employee_details.txt<br \/>\n001, Mehul, Chourey, 21, 9848022337, Hyderabad<br \/>\n002, Prerna, Tripathi, 22, 9848022338,\u00a0Chennai<br \/>\n003, Shreyash, Tiwari, 22, 9848022339,\u00a0Delhi<br \/>\n004, Kajal, Jain, 21, 9848022330,\u00a0Goa<br \/>\n005, Revti, Vadjikar, 23, 9848022336, Banglore<br \/>\n006, Rishabh, Jaiswal, 23, 9848022335,\u00a0Pune<br \/>\n007, Sagar, Joshi, 24, 9848022334, Mumbai<br \/>\n008, Vaishnavi, Dubey, 24, 9848022333,\u00a0Indore<\/p>\n<p>Now,\u00a0there is one sample script we have with the name sample1_script.pig, in the same HDFS directory.\u00a0Also, it have some statements performing operations and transformations on the employee relation,like:<\/p>\n<p><strong>employee = LOAD &#8216;hdfs:\/\/localhost:9000\/pig_data\/employee_details.txt&#8217; USING <\/strong><br \/>\n<strong>PigStorage(&#8216;,&#8217;) as (id:int, <\/strong>firstname<strong>:<\/strong>chararray<strong>, <\/strong>lastname<strong>:<\/strong>chararray<strong>,<\/strong><br \/>\n<strong>phone:<\/strong>chararray<strong>, city:<\/strong>chararray<strong>);<\/strong><br \/>\n<strong>employee_order = ORDER employee BY age DESC;<\/strong><br \/>\n<strong>STORE employee_order INTO &#8217;employee_order_table&#8217; USING org.apache.HCatalog.pig.HCatStorer();<\/strong><br \/>\n<strong>employee_limit = LIMIT employee_order 4;<\/strong><br \/>\n<strong>Dump employee_limit;<\/strong><\/p>\n<p>Now,see, data in the file named employee_details.txt as a relation named employee is stored in the first statement of the script.<\/p>\n<p>Afterward,\u00a0 the tuples of the relation are arranged in the second statement of the script in the descending order,\u00a0 on the basis of\u00a0age, as well as store it as employee_order.<\/p>\n<p>Moreover, the processed data employee_order results in a separate table named employee_order_table is stored in the third statement.<\/p>\n<p>And, the first four-tuples of employee_order as employee_limit will be stored in the fourth statement of the script.<\/p>\n<p>Ultimately, the last and the fifth statement will dump the content of the relation employee_limit.<br \/>\nFurther execute the sample1_script.pig, like:<\/p>\n<pre class=\"EnlighterJSRAW\">$.\/pig -useHCatalog hdfs:\/\/localhost:9000\/pig_data\/sample1_script.pig<\/pre>\n<p>Hence,\u00a0for the output (part_0000, part_0001),\u00a0 check output directory (hdfs: user\/tmp\/hive).<\/p>\n<p>So, this was all about HCatalog and Pig Integration. Hope, it helps.<\/p>\n<h2>Conclusion<\/h2>\n<p>Hence, we have seen the concept of HCatalog and Pig Integration in detail. Also, we discussed how to run Pig with HCatalog and its example. Still, if any doubt regarding HCatalog and Pig Integration, ask in the comment tab.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our last HCatalog tutorial, we discussed HCatalog loader and storer. Today, we will see HCatalog and Pig Integration. We can\u00a0easily integrate HCatalog with Pig. Moreover, we will also see the example of HCatalog&#46;&#46;&#46;<\/p>\n","protected":false},"author":7,"featured_media":20777,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[225,1241,5505,5535,7344,11668,15222],"class_list":["post-20016","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hcatalog","tag-accessing-pig-with-hcatalog","tag-authentication","tag-hcatalog-and-pig-integration","tag-hcatalog-tutorial","tag-jars-and-configuration-files","tag-running-pig-with-hcatalog","tag-usehcatalog-flag"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>HCatalog and Pig Integration | Accessing Pig With HCatalog - DataFlair<\/title>\n<meta name=\"description\" content=\"HCatalog and Pig Integration,how to run HCatalog with Pig,accessing HCatalog with PIg,HCatalog tutorial,jars &amp; Configuration files\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"HCatalog and Pig Integration | Accessing Pig With HCatalog - DataFlair\" \/>\n<meta property=\"og:description\" content=\"HCatalog and Pig Integration,how to run HCatalog with Pig,accessing HCatalog with PIg,HCatalog tutorial,jars &amp; Configuration files\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-08T04:00:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"HCatalog and Pig Integration | Accessing Pig With HCatalog - DataFlair","description":"HCatalog and Pig Integration,how to run HCatalog with Pig,accessing HCatalog with PIg,HCatalog tutorial,jars & Configuration files","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/","og_locale":"en_US","og_type":"article","og_title":"HCatalog and Pig Integration | Accessing Pig With HCatalog - DataFlair","og_description":"HCatalog and Pig Integration,how to run HCatalog with Pig,accessing HCatalog with PIg,HCatalog tutorial,jars & Configuration files","og_url":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-07-08T04:00:25+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd"},"headline":"HCatalog and Pig Integration | Accessing Pig With HCatalog","datePublished":"2018-07-08T04:00:25+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/"},"wordCount":797,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg","keywords":["Accessing Pig With HCatalog","Authentication","HCatalog and Pig Integration","HCatalog Tutorial","Jars and Configuration Files","Running Pig with HCatalog","useHCatalog Flag"],"articleSection":["HCatalog Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/","url":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/","name":"HCatalog and Pig Integration | Accessing Pig With HCatalog - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg","datePublished":"2018-07-08T04:00:25+00:00","description":"HCatalog and Pig Integration,how to run HCatalog with Pig,accessing HCatalog with PIg,HCatalog tutorial,jars & Configuration files","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/HCatalog-and-Pig-Integration-01-1.jpg","width":1200,"height":628,"caption":"HCatalog and Pig Integration | Accessing Pig With HCatalog"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/hcatalog-and-pig-integration\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"HCatalog Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/hcatalog\/"},{"@type":"ListItem","position":3,"name":"HCatalog and Pig Integration | Accessing Pig With HCatalog"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/beb0cab24b7aa54423a3b50e669a9dcd","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c322416204232f4dd97ef3901b0a499a5d34d7ba7fe333f4bfe53a907873d293?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team specializes in creating clear, actionable content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Backed by industry expertise, we make learning easy and career-oriented for beginners and pros alike.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam3\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/20016","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=20016"}],"version-history":[{"count":0,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/20016\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/20777"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=20016"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=20016"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=20016"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}