Sqoop Job – Creating And Executing | Saved Jobs
Keeping you updated with latest technology trends, Join DataFlair on Telegram
In this Sqoop Tutorial, we discuss what is Sqoop Job. Sqoop Job allows us to create and work with saved jobs in sqoop. First, we will start with a brief introduction to a Sqoop Saved Job. Afterward, we will move forward to the sqoop job, we will learn purpose and syntax of a sqoop job. Also, we will cover the method to create a job in Sqoop and Sqoop Job incremental import.
2. Saved Jobs in Sqoop
Basically, by issuing the same command multiple times we can perform imports and exports in sqoop repeatedly. Moreover, we can say it is a most expected scenario while using the incremental import capability.
In addition, we can define saved jobs by Sqoop. Basically, that makes this process easier. Moreover, to execute a Sqoop command at a later time we need some information that configuration information is recorded by a sqoop saved job.
Moreover, note that the job descriptions are saved to a private repository stored in $HOME/.sqoop/, by default. Also, we can configure Sqoop to instead use a shared metastore. However, that makes saved jobs available to multiple users across a shared cluster.
Learn more about Sqoop Codegen
If these professionals can make a switch to Big Data, so can you:
Java → Big Data Consultant, JDA
PeopleSoft → Big Data Architect, Hexaware
3. What is Sqoop Job?
Basically, Sqoop Job allows us to create and work with saved jobs. However, to specify a job, Saved jobs remember the parameters we use. Hence, we can re-execute them by invoking the job by its handle. However, we use this re-calling or re-executing in the incremental import. That can import the updated rows from RDBMS table to HDFS.
In other words, to perform an incremental import if a saved job is configured, then state regarding the most recently imported rows is updated in the saved job. Basically, that allows the job to continually import only the newest rows.
4. Syntax of Sqoop Job
$ sqoop job (generic-args) (job-args) [– [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args) [– [subtool-name] (subtool-args)]
However, the Sqoop job arguments can be entered in any order with respect to one another. But the Hadoop generic arguments must precede any job arguments.
Table 1. Job management options
|–create <job-id>||Define a new saved job with the specified job-id (name). A second Sqoop|
|–delete <job-id>||Delete a saved job.|
|–exec <job-id>||Given a job defined with –create, run the saved job.|
|–show <job-id>||Show the parameters for a saved job.|
|–list||List all saved jobs|
5. How to Create Sqoop Job
By using the –create action we can create the saved job in sqoop. Moreover, this operation requires a — followed by a tool name and its arguments. Afterwards, the tool and its arguments will form the basis of the saved job. Consider:
$ sqoop job –create myjob — import –connect jdbc:mysql://example.com/db \
Basically, it creates a Sqoop job that we call myjob. Also, we can execute it later. Make sure the job is not to run. However, now this job is available in the list of saved jobs:
$ sqoop job –list
Read about Sqoop Eval
6. Inspect Job in Sqoop
By using the show action we can inspect the configuration of a job:
$ sqoop job –show myjob
direct.import = false
codegen.input.delimiters.record = 0
hdfs.append.dir = false
db.table = mytable
However, we can run the job with exec also, if we are satisfied with it:
$ sqoop job –exec myjob
10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation
Basically, here the exec action allows us to override arguments of the saved job. It is possible by supplying them after a –.
Let’s take an example to understand, to require a username, if the database were changed, So, we could specify the username and password with:
$ sqoop job –exec myjob — –username someuser -P
Table 2. Metastore connection options
|–meta-connect <jdbc-uri>||Specifies the JDBC connect string used to connect to the metastore|
In addition, we can configure sqoop.metastore.client.autoconnect.url with this address, in conf/sqoop-site.xml. Hence, there is no need to supply –meta-connect to use a remote metastore. Although, to move the private metastore to a location on your filesystem other than your home directory we can modify this parameter.
Let’s know about Sqoop Metastore in detail
However, there is one condition, we must explicitly supply –meta-connect if we configure sqoop.metastore.client.enable.autoconnect with the value false.
Table 3. Common options
|–help||Print usage instructions|
|–verbose||Print more information while working|
7. Sqoop Saved Jobs and Passwords
Basically, multiple users can access Sqoop metastore since it is not a secure resource. Hence, Sqoop does not store passwords in the metastore. So, for the security purpose, you will be prompted for that password each time you execute the job if we create a sqoop job that requires a password.
In addition, by setting sqoop.metastore.client.record.password to true in the configuration we can easily enable passwords in the metastore.
Note: If we are executing saved jobs via Oozie we have to set sqoop.metastore.client.record.password to true. It is important since when executed as Oozie tasks, Sqoop cannot prompt the user to enter passwords.
8. Sqoop Job Incremental Imports
Basically, by comparing the values in a check column against a reference value for the most recent import all the sqoop incremental imports are performed.
As a result, we have seen the complete content regarding Sqoop Jobs. Also, we have seen the way to create or sqoop job incremental import. Apart from that, we have also learned syntax and purpose of a sqoop job to understand. Moreover, we have seen some essential arguments also. However, still, if you feel any doubts regarding, feel free to ask in the comment section. We assure you that we will get back to you.
See Also- Sqoop Merge