HCatalog Loader and Storer – Usage & Example

1. HCatalog Loader and Storer

In our last HCatalog tutorial, we discussed HCatalog Commands. Today, we will see HCatalog Loader and Storer. Moreover, we will also discuss the examples of HCatalog loader and storer. Basically, to read and write data in HCatalog-managed tables, the HCatLoader and HCatStorer interfaces are used with Pig scripts.
So, let’s learn both HCatalog Loader and Storer in detail:

HCatalog Loader and Storer

HCatalog Loader and Storer – Usage & Example

2. HCatLoader

In order to read data from HCatalog-managed tables, we use HCatLoader with Pig scripts.

a. Usage

Via a Pig load statement, we can access HCatLoader.

A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

b. Assumptions

However, make sure the table name is specified in single quotes: LOAD ‘tablename’. Also, we must specify our input as ‘dbname.tablename’, if we are using a non-default database. Moreover, we must create our database and table prior to running the Pig script, if we are using Pig 0.9.2 or earlier. In addition, we can issue these create commands in Pig using the SQL command, beginning with Pig 0.10.
Furthermore, without specifying a database, the Hive metastore lets us create tables; however, the database name is ‘default’, if we create the tables this way.
Although, make sure we can indicate which partitions to scan by immediately following the load statement with a partition filter statement if the table is partitioned.

c. HCatLoader Data Types

Also, make sure HCatLoader can only read the Hive data types, such as:
Types in Hive 0.12.0 and Earlier
1. boolean
2. int
3. long
4. float
5. double
6. string
7. binary
some complex data types:
8. map – here key type must be the string
9. ARRAY<any type>
10. struct<any type fields>
Types in Hive 0.13.0 and Later
11. tinyint
12. smallint
13. date
14. timestamp
15. decimal
16. char(x)
17. varchar(x)
Let’s revise HCatalog Features

Hadoop Quiz
If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. HCatStorer

Further, in order to write data to HCatalog-managed tables, we use HCatStorer with Pig scripts.

HCatalog Loader and Storer

HCatalog Loader and Storer

Test how much you learned about HCatalog

a. Usage

via a Pig store statement, we access HCatStorer.

A = LOAD ...
B = FOREACH A ...
...
...
my_processed_data = ...
STORE my_processed_data INTO 'tablename'
  USING org.apache.hive.hcatalog.pig.HCatStorer();

b. Assumptions

As similar as HCatStorer, here also table name must be in single quotes, like LOAD ‘tablename’. Moreover, to run script make sure that both the database and table must be created prior. Also, we must specify our input as ‘dbname.tablename’, if we are using a non-default database. And, we need to create our database and table prior to running the Pig script, if we are using Pig 0.9.2 or earlier. Further, we can issue these create commands in Pig using the SQL command, beginning with Pig 0.10.
As the best feature, without even specifying a database, the Hive metastore lets us create tables. So, after that, the database name is ‘default’ if we create tables this way, and also there is no need to specify the database name in the store statement.

c. Store Examples

By using HCatStorer we can write to a non-partitioned table simply. Also, the table contents will be overwritten:

store z into 'web_data' using org.apache.hive.hcatalog.pig.HCatStorer();

Let’s revise HCatalog applications and use cases
In addition, specify the partition value in the store function, to add one new partition to a partitioned table. Make sure that the whole string should be single-quoted as well as separated with an equals sign:
store z into ‘web_data’ using org.apache.hive.hcatalog.pig.HCatStorer(‘datestamp=20110924’);
Ensure that the partition column is present in our data, then only call HCatStorer with no argument, to write into multiple partitions at once:

store z into 'web_data' using org.apache.hive.hcatalog.pig.HCatStorer();
 -- datestamp must be a field in the relation z

d. HCatStorer Data Types
Types in Hive 0.12.0 and Earlier
1. Boolean
2. int
3. long
4. float
5. double
6. chararray
7. bytearray
Some complex data types:
8. map
9. bag
10. tuple
Let’s learn Hcatalog & Pig Integration
Types in Hive 0.13.0 and Later
11. short
12. datetime
13. bigdecimal
So, this was all about HCatalog Loader and Storer. Hope you like our explanation.

4. Conclusion: HCatalog Loader and Storer

Hence, we have seen the concept of HCatalog Loader and Storer. So, this article will definitely help to clear all doubts regarding HCatalog loader and storer. Still, if any doubt, ask in the comment tab.
See also –
HCatalog InputOutput Interface
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.