HCatalog and Pig Integration | Accessing Pig With HCatalog

1. HCatalog and Pig Integration

In our last HCatalog tutorial, we discussed HCatalog loader and storer. Today, we will see HCatalog and Pig Integration. We can easily integrate HCatalog with Pig. Moreover, we will also see the example of HCatalog and Pig Integration to understand it well.
So, let’s start HCatalog and Pig Integration.

HCatalog and Pig Integration

HCatalog and Pig Integration | Accessing Pig With HCatalog

2. Running Pig with HCatalog

Generally, it is not possible for Pig to pick up HCatalog jars. So, either we can use a flag in the pig command or we can set the environment variables PIG_CLASSPATH and PIG_OPTS,  to bring in the necessary jars, such as:

a. The -useHCatalog Flag

Hence, for working with HCatalog, simply include the following flag, to bring in the appropriate jars:
pig -useHCatalog

b. Jars and Configuration Files

Make sure we need to tell Pig where to find our HCatalog jars and the Hive jars used by the HCatalog client, for Pig commands that omit -useHCatalog. Hence, we need to define the environment variable PIG_CLASSPATH with the appropriate jars, to do this.
In addition, HCatalog can tell us the jars it needs. Though, it needs to know where Hadoop and Hive are installed, for that. Also, in the PIG_OPTS variable, we need to tell Pig the URI for our metastore.
Further, we can perform following in the case where we have installed Hadoop and Hive via tar:
export HADOOP_HOME=<path_to_hadoop_install>
export HIVE_HOME=<path_to_hive_install>
export HCAT_HOME=<path_to_hcat_install>
export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:\
$HIVE_HOME/lib/slf4j-api-*.jar
export PIG_OPTS=-Dhive.metastore.uris=thrift://<hostname>:<port>
Also, we can pass the jars in your command line:

<path_to_pig_install>/bin/pig -Dpig.additional.jars=\
$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar  <script.pig>

Moreover, in each filepath, the version number found will be substituted for *. As an example here release 0.5.0 of HCatalog uses following jars and conf files:
$HCAT_HOME/share/hcatalog/hcatalog-core-0.5.0.jar
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.5.0.jar
$HIVE_HOME/lib/hive-metastore-0.10.0.jar
$HIVE_HOME/lib/libthrift-0.7.0.jar
$HIVE_HOME/lib/hive-exec-0.10.0.jar
$HIVE_HOME/lib/libfb303-0.7.0.jar
$HIVE_HOME/lib/jdo2-api-2.3-ec.jar
$HIVE_HOME/conf
$HADOOP_HOME/conf
$HIVE_HOME/lib/slf4j-api-1.6.1.jar

c. Authentication

Make sure you have run “kinit <username>@FOO.COM” to get a Kerberos ticket and to be able to authenticate to the HCatalog server, if you are using a secure cluster and a failure results in a message like “2010-11-03 16:17:28,225 WARN hive.metastore … – Unable to connect metastore with URI thrift://…” in /tmp/<username>/hive.log.

Hadoop Quiz
If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. Example of HCatalog and Pig Integration

For Example-
Now, let’s suppose we have a file employee_details.txt in HDFS, its content is:
Let’s Learn HCatalog Command Line Interface(CLI)
employee_details.txt
001, Mehul, Chourey, 21, 9848022337, Hyderabad
002, Prerna, Tripathi, 22, 9848022338, Chennai
003, Shreyash, Tiwari, 22, 9848022339, Delhi
004, Kajal, Jain, 21, 9848022330, Goa
005, Revti, Vadjikar, 23, 9848022336, Banglore
006, Rishabh, Jaiswal, 23, 9848022335, Pune
007, Sagar, Joshi, 24, 9848022334, Mumbai
008, Vaishnavi, Dubey, 24, 9848022333, Indore
Now, there is one sample script we have with the name sample1_script.pig, in the same HDFS directory. Also, it have some statements performing operations and transformations on the employee relation,like:
employee = LOAD ‘hdfs://localhost:9000/pig_data/employee_details.txt’ USING
PigStorage(‘,’) as (id:int, firstname:chararray, lastname:chararray,
phone:chararray, city:chararray);
employee_order = ORDER employee BY age DESC;
STORE employee_order INTO ’employee_order_table’ USING org.apache.HCatalog.pig.HCatStorer();
employee_limit = LIMIT employee_order 4;
Dump employee_limit;
Now,see, data in the file named employee_details.txt as a relation named employee is stored in the first statement of the script.
Let’s discuss HCatalog Applications
Afterward,  the tuples of the relation are arranged in the second statement of the script in the descending order,  on the basis of age, as well as store it as employee_order.
Moreover, the processed data employee_order results in a separate table named employee_order_table is stored in the third statement.
And, the first four-tuples of employee_order as employee_limit will be stored in the fourth statement of the script.
Ultimately, the last and the fifth statement will dump the content of the relation employee_limit.
Further execute the sample1_script.pig, like:

$./pig -useHCatalog hdfs://localhost:9000/pig_data/sample1_script.pig

Hence, for the output (part_0000, part_0001),  check output directory (hdfs: user/tmp/hive).
Let’s revise HCatalog CLI Commands (Create, Alter, View)
So, this was all about HCatalog and Pig Integration. Hope, it helps.

4. Conclusion

Hence, we have seen the concept of HCatalog and Pig Integration in detail. Also, we discussed how to run Pig with HCatalog and its example. Still, if any doubt regarding HCatalog and Pig Integration, ask in the comment tab.
See also –
HCatalog MapReduce Integration
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.