Sqoop Job – Creating And Executing | Saved Jobs

1. Objective

In this Sqoop Tutorial, we discuss what is Sqoop Job. Sqoop Job allows us to create and work with saved jobs in sqoop. First, we will start with a brief introduction to a  Sqoop Saved Job. Afterward, we will move forward to the sqoop job, we will learn purpose and syntax of a sqoop job. Also, we will cover the method to create a job in Sqoop and Sqoop Job incremental import.

Sqoop Job

Introduction to Sqoop Job & Saved Job

2. Saved Jobs in Sqoop

Basically, by issuing the same command multiple times we can perform imports and exports in sqoop repeatedly. Moreover, we can say it is a most expected scenario while using the incremental import capability.
In addition, we can define saved jobs by Sqoop. Basically, that makes this process easier. Moreover, to execute a Sqoop command at a later time we need some information that configuration information is recorded by a sqoop saved job.
Moreover, note that the job descriptions are saved to a private repository stored in $HOME/.sqoop/, by default. Also, we can configure Sqoop to instead use a shared metastore. However, that makes saved jobs available to multiple users across a shared cluster.
Learn more about Sqoop Codegen

If these professionals can make a switch to Big Data, so can you:
Rahul Doddamani Story - DataFlair
Rahul Doddamani
Java → Big Data Consultant, JDA
Follow on
Mritunjay Singh Success Story - DataFlair
Mritunjay Singh
PeopleSoft → Big Data Architect, Hexaware
Follow on
Rahul Doddamani Success Story - DataFlair
Rahul Doddamani
Big Data Consultant, JDA
Follow on
I got placed, scored 100% hike, and transformed my career with DataFlair
Enroll now
Deepika Khadri Success Story - DataFlair
Deepika Khadri
SQL → Big Data Engineer, IBM
Follow on
DataFlair Web Services
You could be next!
Enroll now

3. What is Sqoop Job?

Basically, Sqoop Job allows us to create and work with saved jobs. However, to specify a job, Saved jobs remember the parameters we use. Hence, we can re-execute them by invoking the job by its handle. However, we use this re-calling or re-executing in the incremental import. That can import the updated rows from RDBMS table to HDFS.
In other words, to perform an incremental import if a saved job is configured, then state regarding the most recently imported rows is updated in the saved job. Basically, that allows the job to continually import only the newest rows.

4. Syntax of Sqoop Job

$ sqoop job (generic-args) (job-args) [– [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args) [– [subtool-name] (subtool-args)]
However, the Sqoop job arguments can be entered in any order with respect to one another. But the Hadoop generic arguments must precede any job arguments.
Table 1. Job management options

ArgumentDescription
–create <job-id>Define a new saved job with the specified job-id (name). A second Sqoop
–delete <job-id>Delete a saved job.
–exec <job-id>Given a job defined with –create, run the saved job.
–show <job-id>Show the parameters for a saved job.
–listList all saved jobs

Read about Sqoop List Tables in detail

5. How to Create Sqoop Job

By using the –create action we can create the saved job in sqoop. Moreover, this operation requires a — followed by a tool name and its arguments. Afterwards, the tool and its arguments will form the basis of the saved job. Consider:
$ sqoop job –create myjob — import –connect jdbc:mysql://example.com/db \
–table mytable
Basically, it creates a Sqoop job that we call myjob. Also, we can execute it later. Make sure the job is not to run. However, now this job is available in the list of saved jobs:
$ sqoop job –list
Available jobs:
myjob
Read about Sqoop Eval

6. Inspect Job in Sqoop

By using the show action we can inspect the configuration of a job:
$ sqoop job –show myjob
Job: myjob
Tool: import
Options:
—————————-
direct.import = false
codegen.input.delimiters.record = 0
hdfs.append.dir = false
db.table = mytable

However, we can run the job with exec also, if we are satisfied with it:
$ sqoop job –exec myjob
10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

Basically, here the exec action allows us to override arguments of the saved job. It is possible by supplying them after a –.
Let’s take an example to understand, to require a username, if the database were changed, So, we could specify the username and password with:
For Example,
$ sqoop job –exec myjob — –username someuser -P
Enter password:

Table 2. Metastore connection options

ArgumentDescription
–meta-connect <jdbc-uri>Specifies the JDBC connect string used to connect to the metastore

In addition, we can configure sqoop.metastore.client.autoconnect.url with this address, in conf/sqoop-site.xml. Hence, there is no need to supply –meta-connect to use a remote metastore. Although, to move the private metastore to a location on your filesystem other than your home directory we can modify this parameter.
Let’s know about Sqoop Metastore in detail
However, there is one condition, we must explicitly supply –meta-connect if we configure sqoop.metastore.client.enable.autoconnect with the value false.
Table 3. Common options

ArgumentDescription
–helpPrint usage instructions
–verbosePrint more information while working

7. Sqoop Saved Jobs and Passwords

Basically, multiple users can access Sqoop metastore since it is not a secure resource. Hence, Sqoop does not store passwords in the metastore. So, for the security purpose, you will be prompted for that password each time you execute the job if we create a sqoop job that requires a password.
In addition, by setting sqoop.metastore.client.record.password to true in the configuration we can easily enable passwords in the metastore.
Note: If we are executing saved jobs via Oozie we have to set sqoop.metastore.client.record.password to true. It is important since when executed as Oozie tasks, Sqoop cannot prompt the user to enter passwords.

Hadoop Quiz

8. Sqoop Job Incremental Imports

Basically, by comparing the values in a check column against a reference value for the most recent import all the sqoop incremental imports are performed.

9. Conclusion

As a result, we have seen the complete content regarding Sqoop Jobs. Also, we have seen the way to create or sqoop job incremental import. Apart from that, we have also learned syntax and purpose of a sqoop job to understand. Moreover, we have seen some essential arguments also. However, still, if you feel any doubts regarding, feel free to ask in the comment section. We assure you that we will get back to you.
See Also- Sqoop Merge
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.