Sqoop Codegen – Arguments & Commands in Codegen with Example
A tool which generates Java classes that encapsulate and interpret imported records is what we call Sqoop Codegen. However, there is much more to know about Sqoop Codegen in Sqoop. So in this article, we will learn each and every aspect of Codegen. in Sqoop. Apart from the introduction, we will also learn Codegen in Sqoop’s purpose as well as its syntax.
Introduction to Sqoop Codegen and its Purpose
A tool which generates Java classes that encapsulate and interpret imported records is what we call Sqoop-Codegen. Moreover, as part of the Sqoop import process, the Java definition of a record is instantiated. However, we can also perform it separately. Likewise, we can recreate a java source if it gets lost. Also, we can create new versions of a class which uses different delimiters between fields, and so on.
Refer this link to know about Sqoop Export.
Codegen Syntax in Sqoop
$ sqoop codegen (generic-args) (codegen-args)
$ sqoop-codegen (generic-args) (codegen-args)
However, the Codegen in sqoop arguments can be entered in any order with respect to one another but the Hadoop generic arguments must precede any Sqoop Codegen arguments only.
Sqoop Codegen Arguments
a. Common arguments in Sqoop Codegen
Argument | Description |
–connect <jdbc-uri> | Specify JDBC connect string |
–connection-manager <class-name> | Specify connection manager class to use |
–driver <class-name> | Manually specify JDBC driver class to use |
–hadoop-mapred-home <dir> | Override $HADOOP_MAPRED_HOME |
–help | Print usage instructions |
–password-file | Set path for a file containing the authentication password |
-P | Read password from console |
–password <password> | Set authentication password |
–username <username> | Set authentication username |
–verbose | Print more information while working |
–connection-param-file <filename> | Optional properties file that provides connection parameters |
–relaxed-isolation | Set connection transaction isolation to read uncommitted for the mappers. |
b. Code generation arguments in Sqoop Codegen
Argument | Description |
–bindir <dir> | Output directory for compiled objects |
–class-name <name> | Sets the generated class name. This overrides –package-name. When combined with –jar-file, sets the input class. |
–jar-file <file> | Disable code generation; use specified jar |
–outdir <dir> | Output directory for generated code |
–package-name <name> | Put auto-generated classes in this package |
–map-column-java <m> | Override default mapping from SQL type to Java type for configured columns |
c. Output line formatting arguments in Sqoop Codegen
Argument | Description |
–enclosed-by <char> | Sets a required field enclosing character |
–escaped-by <char> | Sets the escape character |
–fields-terminated-by <char> | Sets the field separator character |
–lines-terminated-by <char> | Sets the end-of-line character |
–mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘ |
–optionally-enclosed-by <char> | Sets a field enclosing character |
d. Input parsing arguments in Sqoop Codegen
Argument | Description |
–input-enclosed-by <char> | Sets a required field encloser |
–input-escaped-by <char> | Sets the input escape character |
–input-fields-terminated-by <char> | Sets the input field separator |
–input-lines-terminated-by <char> | Sets the input end-of-line character |
–input-optionally-enclosed-by <char> | Sets a field enclosing character |
e. Hive arguments in Sqoop Codegen
Argument | Description |
–hive-home <dir> | Override $HIVE_HOME |
–hive-import | Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
–hive-overwrite | Overwrite existing data in the Hive table. |
–create-hive-table | If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
–hive-table <table-name> | Sets the table name to use when importing to Hive. |
–hive-drop-import-delims | Drops \n, \r, and \01 from string fields when importing to Hive. |
–hive-delims-replacement | Replace \n, \r, and \01 from string fields with user-defined string when importing to Hive. |
–hive-partition-key | Name of a hive field to partition is shared on |
–hive-partition-value <v> | String-value that serves as partition key for this imported into the hive in this job. |
–map-column-hive <map> | Override default mapping from SQL type to Hive type for configured columns. |
Basically, Sqoop generates a file containing the HQL statements to create a table and load data only if Hive arguments are provided to the code generation tool.
Sqoop Codegen Commands
All these commands actually define the importance of Codegen tool. Basically, every database table has one DAO class that carry getter and setter methods.
a. Change the directory to /usr/local/hadoop/sbin
$ cd /usr/local/hadoop/sbin
b. Start all hadoop daemons
$ start-all.sh
c. The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions
$ jps
d. Change the directory to /usr/local/sqoop/bin
$ cd /usr/local/sqoop/bin
Codegen Example Invocations in Sqoop
Here, for the employees table of a corporate database we are recreating the record interpretation code.
$ sqoop codegen –connect jdbc:mysql://db.example.com/corp \
   –table employees
Conclusion
As a result, we have seen a complete introduction to Sqoop Codegen. Also, we have studied purpose and syntax of codegen in Sqoop. Likewise, we have learned all the arguments and commands associated with Sqoop Codegen. But if you want to ask any query regarding, feel free to ask in the comment section.
Learn about Sqoop eval commands & Sqoop Books
For reference
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google