Sqoop Troubleshooting Tips – Apache Sqoop Known Issues
While working on Sqoop, there is the possibility that may failure encounters. So, to troubleshoot any failure there are several steps we should follow. So, in this article, “Sqoop Troubleshooting” we will learn the sqoop troubleshooting process or we can say apache sqoop known issues. Also, we will learn sqoop troubleshooting tips to troubleshoot in a better way.
General Sqoop Troubleshooting Process
While running Sqoop, there are several steps we should follow. Basically, that will help to troubleshoot any failure that we encounter. Such as:
- Generally, we can turn on verbose output by executing the same command again. It will help to identify errors. Also, by specifying the –verbose option. Basically, Sqoop troubleshooting process produces more debug output on the console. Hence we can easily inspect them.
- Also, to see if there are any specific failures recorded we can look at the task logs from Hadoop. Since failure may occur while task execution is not relayed correctly to the console.
- Basically, Sqoop troubleshooting is just that the necessary input files or input/output tables are present. Moreover, can be accessed by the user that Sqoop is executing as or connecting to the database as. Since there is a possibility that the necessary files or tables are present. However, it is not necessary that Sqoop connects can access these files.
- Generally, we need to break the job into two separate actions, to see where the problem really occurs. Especially, if you are doing a compound action try breaking them. While compound actions can be like populating a Hive table or partition in a hive.
Specific Sqoop Troubleshooting Tips
a. Oracle: Connection Reset Errors
Problem 1.
For Oracle whenever we use the default Sqoop connector, some data get transferred. However, a lot of errors are reported during the map-reduce job. Like below:
11/05/26 16:23:47 INFO mapred.JobClient: Task Id : attempt_201105261333_0002_m_000002_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset
at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset
at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:190)
at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:159)
… 9 more
Caused by: java.sql.SQLRecoverableException: IO Error: Connection reset
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:428)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:536)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:228)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at com.cloudera.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:152)
at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:184)
… 10 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at oracle.net.ns.DataPacket.send(DataPacket.java:199)
at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:366)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:752)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:366)
… 18 more
Solution.
Since we lack a fast random number generation device on the host this type of problem occurs. Especially where the map tasks execute. Moreover, we can address this on typical Linux systems. It is possible by setting the following property in the java.security file.
securerandom.source=file:/dev/../dev/urandom
Under $JAVA_HOME/jre/lib/security directory we can find the java.security file. Also, we can specify this property on the command line. By:
-D mapred.child.java.opts=”-Djava.security.egd=file:/dev/../dev/urandom”
Note: Basically, it is essential to specify this weird path /dev/../dev/urandom as it is due to a Java bug 6202721. Else /dev/urandom will be ignored and substituted by /dev/random.
b. Oracle: Case-Sensitive Catalog Query Errors
Problem 2.
When Sqoop cannot figure out column names while working with Oracle. It may encounter problems. Although, there is one major reason behind this scenario. That is the catalog queries which Sqoop uses for Oracle. Expect the correct case to be specified for the username as well as the table name.
For example,
By using –hive-import and resulting in a NullPointerException:
1/09/21 17:18:49 INFO manager.OracleManager: Time zone has been set to
GMT
11/09/21 17:18:49 DEBUG manager.SqlManager: Using fetchSize for next
query: 1000
11/09/21 17:18:49 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM addlabel_pris t WHERE 1=0
11/09/21 17:18:49 DEBUG manager.OracleManager$ConnCache: Caching
released connection for jdbc:oracle:thin:
11/09/21 17:18:49 ERROR sqoop.Sqoop: Got exception running Sqoop:
java.lang.NullPointerException
java.lang.NullPointerException
at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148)
at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
Solution:
Make sure we specify the username, that Sqoop is connecting as, in upper case. Unless it was created with mixed/lower case within quotes.
Again ensure we need to specify the table name, that we are working with, in upper case. Unless it was created with mixed/lower case within quotes.
c. MySQL: Connection Failure
Problem 3.
It is possible to get the below connection failure. While importing a MySQL table into Sqoop if we do not have the permissions to access our MySQL database.
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Solution:
At first, make sure that where we are running Sqoop there we can connect to the database from the node:
$ mysql –host=<IP Address> –database=test –user=<username> –password=<password>
Since this works, it rules out any problem with the client network configuration. Also with security/authentication configuration.
Afterwards, add the network port for the server to our my.cnf file /etc/my.cnf:
[mysqld]
port = xxxx
In addition, To connect by Sqoop, set up a user account. Now, to access the database over the network grant permissions to the user by following steps.
- At First, Log into MySQL as root mysql -u root -p<ThisIsMyPassword>.
- Issue the following command:
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
mysql> grant all privileges on test.* to ‘testuser’@’%’ identified by ‘testpassword’
Note: It will enable the testuser to connect to the MySQL server from any IP address.
d. Oracle: ORA-00933 error (SQL command not properly ended)
Problem 4.
We may encounter the below problem. while working with Oracle when the Sqoop command explicitly specifies the –driver <driver name> option. Moreover, the built-in connection manager selection defaults to the generic connection manager, when the driver option is included in the Sqoop command that causes this issue with Oracle.
Likewise, the built-in connection manager selection mechanism selects the Oracle-specific connection manager, if the driver option is not specified. That generates valid SQL for Oracle also uses the driver “oracle.jdbc.OracleDriver”.
ERROR manager.SqlManager: Error executing statement:
java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended
Solution:
Re-run the Sqoop command after omitting the option –driver oracle.jdbc.driver.OracleDriver.
e. MySQL: Import of TINYINT(1) from MySQL behaves strangely
Problem 5.
While Sqoop may treat TINYINT(1) columns as booleans, that may cause issues with HIVE import. Generally, it occurs when by default the MySQL JDBC connector maps the TINYINT(1) to java.sql.Types.BIT. Basically, that Sqoop maps to boolean by default.
Solution:
However, there is one possible solution for it. To force MySQL JDBC connector to stop converting TINYINT(1) to java.sql.Types.BIT by adding tinyInt1isBit=false into your JDBC path. Basically, to create something like jdbc:mysql://localhost/test?tinyInt1isBit=false.
Conclusion
As a result, we have learned all the possible sqoop troubleshooting steps. However, that will help to troubleshoot all possible failure that we encounter through sqoop troubleshooting or Apache sqoop known issues. Still, if you want to ask any query, feel free to ask through the comment section. We will definitely, get back to you.
See also- Sqoop Architecture & Why Learn Sqoop
For reference
Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google