Sqoop Validation – Interfaces & Limitations of Sqoop Validate
Keeping you updated with latest technology trends, Join DataFlair on Telegram
Sqoop Validation is nothing but to validate the data copied. In this article, we will learn the whole concept of validation in Sqoop. After the introduction, we will learn the purpose of validation in sqoop. Moreover, we will also cover sqoop validation interface, Sqoop validation syntax & configuration, examples and limitations of sqoop validation as well.
2. Introduction to Sqoop Validation
a. Sqoop validation simply means validate the data copied. Basically, either import or Export by comparing the row counts from the source as well as the target post copy.
b. Moreover, we use this option to compare the row counts between source as well as the target just after data imported into HDFS.
c. While during the imports, all the rows are deleted or added, Sqoop tracks this change. Also updates the log file.
If these professionals can make a switch to Big Data, so can you:
Java → Big Data Consultant, JDA
PeopleSoft → Big Data Architect, Hexaware
3. Interfaces of Sqoop Validation
Basically, there are 3 interfaces of Sqoop Validation such as:
We use the ValidationThreshold to determine whether the error margin between the source and target are acceptable: Absolute, Percentage Tolerant and many more. However, the default implementation is AbsoluteValidationThreshold.
Basically, that ensures that the row counts from source as well as targets are the same.
Also, it has once interface with ValidationFailureHandler, that is responsible for handling failures here. Such as log an error/warning, abort and many more. Although default implementation is LogOnFailureHandler. Here that logs a warning message to the configured logger.
Basically, by delegating the decision to ValidationThreshold Validator drives the validation logic. Also delegates failure handling to ValidationFailureHandler. Moreover, the default implementation is RowCountValidator here. That validates the row counts from source as well as the target.
4. Purpose to Validate in Sqoop
Validation in sqoop’s main purpose is to validate the data copied. Basically, either Sqoop import or Export by comparing the row counts from the source as well as the target post copy.
5. Syntax of Sqoop Validation
$ sqoop import (generic-args) (import-args)
$ sqoop export (generic-args) (export-args)
Basically, validation arguments are part of import and export arguments.
6. Sqoop Validation Configuration
We can say that the validation framework is extensible as well as pluggable in nature. However, it comes with default implementations. Yet we can extend the interfaces to allow custom implementations. Basically, it is possible by passing them as part of the command line arguments as shown below.
|Description||It is a Driver for validation, must implement org.apache.sqoop.validation.Validator|
|Supported values||Important to note that, the value has to be a fully qualified class name|
b. Validation Threshold
|Description||It drives the decision on the basis of validation meeting the threshold or not. Must implement org.apache.sqoop.validation.ValidationThreshold|
|Supported values||Here, also it is important that the value has to be a fully qualified class name.|
c. Validation Failure Handler
|Description||Basically, it is responsible for handling failures, must implement org.apache.sqoop.validation.ValidationFailureHandler|
|Supported values||Likewise, again it is important that the value has to be a fully qualified class name|
7. Limitations of Sqoop Validation
Since it validates only data copied from a single table into HDFS currently. So, there are several limitations in the current implementation. Such as:
- Firstly, all-tables option.
- Since it is free-form query option.
- Basically, data is only imported into Hive, HBase or Accumulo.
- Moreover, we use –where argument for table imports.
- Also, incremental imports.
8. Example Invocations of Sqoop Validation
However, a basic import of a table named EMPLOYEES in the corp database. basically, that uses validation to validate the row counts:
$ sqoop import –connect jdbc:mysql://db.foo.com/corp \
–table EMPLOYEES –validate
Moreover, a basic export to populate a table named bar with sqoop validation enabled:
$ sqoop export –connect jdbc:mysql://db.example.com/foo –table bar \
–export-dir /results/bar_data –validate
Now, another example that overrides the sqoop validation args:
$ sqoop import –connect jdbc:mysql://db.foo.com/corp –table EMPLOYEES \
–validate –validator org.apache.sqoop.validation.RowCountValidator \
In this blog, we have learned the whole concept of Sqoop Validation. Also, we have seen various examples of Validation in sqoop. However, if you want to ask any Query regarding, please ask through the comment section. Then, we will definitely get back to you.
See Also- Sqoop Import Mainframe & Sqoop Troubleshooting