Sqoop Codegen – Arguments & Commands in Codegen with Example

Boost your career with Free Big Data Courses!!

A tool which generates Java classes that encapsulate and interpret imported records is what we call Sqoop Codegen. However, there is much more to know about Sqoop Codegen in Sqoop. So in this article, we will learn each and every aspect of Codegen. in Sqoop. Apart from the introduction, we will also learn Codegen in Sqoop’s purpose as well as its syntax.

Introduction to Sqoop Codegen and its Purpose

A tool which generates Java classes that encapsulate and interpret imported records is what we call Sqoop-Codegen. Moreover, as part of the Sqoop import process, the Java definition of a record is instantiated. However, we can also perform it separately. Likewise, we can recreate a java source if it gets lost. Also, we can create new versions of a class which uses different delimiters between fields, and so on.

Refer this link to know about Sqoop Export.

Codegen Syntax in Sqoop

$ sqoop codegen (generic-args) (codegen-args)
$ sqoop-codegen (generic-args) (codegen-args)
However, the Codegen in sqoop arguments can be entered in any order with respect to one another but the Hadoop generic arguments must precede any Sqoop Codegen arguments only.

Sqoop Codegen Arguments

a. Common arguments in Sqoop Codegen

ArgumentDescription
–connect <jdbc-uri>Specify JDBC connect string
–connection-manager <class-name>Specify connection manager class to use
–driver <class-name>Manually specify JDBC driver class to use
–hadoop-mapred-home <dir>Override $HADOOP_MAPRED_HOME
–helpPrint usage instructions
–password-fileSet path for a file containing the authentication password
-PRead password from console
–password <password>Set authentication password
–username <username>Set authentication username
–verbosePrint more information while working
–connection-param-file <filename>Optional properties file that provides connection parameters
–relaxed-isolationSet connection transaction isolation to read uncommitted for the mappers.

b. Code generation arguments in Sqoop Codegen

ArgumentDescription
–bindir <dir>Output directory for compiled objects
–class-name <name>Sets the generated class name. This overrides –package-name. When combined with –jar-file, sets the input class.
–jar-file <file>Disable code generation; use specified jar
–outdir <dir>Output directory for generated code
–package-name <name>Put auto-generated classes in this package
–map-column-java <m>Override default mapping from SQL type to Java type for configured columns

c. Output line formatting arguments in Sqoop Codegen

ArgumentDescription
–enclosed-by <char>Sets a required field enclosing character
–escaped-by <char>Sets the escape character
–fields-terminated-by <char>Sets the field separator character
–lines-terminated-by <char>Sets the end-of-line character
–mysql-delimitersUses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘
–optionally-enclosed-by <char>Sets a field enclosing character

d. Input parsing arguments in Sqoop Codegen

ArgumentDescription
–input-enclosed-by <char>Sets a required field encloser
–input-escaped-by <char>Sets the input escape character
–input-fields-terminated-by <char>Sets the input field separator
–input-lines-terminated-by <char>Sets the input end-of-line character
–input-optionally-enclosed-by <char>Sets a field enclosing character

e. Hive arguments in Sqoop Codegen

ArgumentDescription
–hive-home <dir>Override $HIVE_HOME
–hive-importImport tables into Hive (Uses Hive’s default delimiters if none are set.)
–hive-overwriteOverwrite existing data in the Hive table.
–create-hive-tableIf set, then the job will fail if the target hive
table exits. By default this property is false.
–hive-table <table-name>Sets the table name to use when importing to Hive.
–hive-drop-import-delimsDrops \n, \r, and \01 from string fields when importing to Hive.
–hive-delims-replacementReplace \n, \r, and \01 from string fields with user-defined string when importing to Hive.
–hive-partition-keyName of a hive field to partition is shared on
–hive-partition-value <v>String-value that serves as partition key for this imported into the hive in this job.
–map-column-hive <map>Override default mapping from SQL type to Hive type for configured columns.

Basically, Sqoop generates a file containing the HQL statements to create a table and load data only if Hive arguments are provided to the code generation tool.

Sqoop Codegen Commands

All these commands actually define the importance of Codegen tool. Basically, every database table has one DAO class that carry getter and setter methods.
a. Change the directory to /usr/local/hadoop/sbin
$ cd /usr/local/hadoop/sbin
b. Start all hadoop daemons
$ start-all.sh
c. The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions
$ jps
d. Change the directory to /usr/local/sqoop/bin
$ cd /usr/local/sqoop/bin

Codegen Example Invocations in Sqoop

Here, for the employees table of a corporate database we are recreating the record interpretation code.
$ sqoop codegen –connect jdbc:mysql://db.example.com/corp \
   –table employees

Conclusion

As a result, we have seen a complete introduction to Sqoop Codegen. Also, we have studied purpose and syntax of codegen in Sqoop. Likewise, we have learned all the arguments and commands associated with Sqoop Codegen. But if you want to ask any query regarding, feel free to ask in the comment section.

Learn about Sqoop eval commands & Sqoop Books
For reference

Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!

Your opinion matters
Please write your valuable feedback about DataFlair on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *