i have sqoop run and loaded data by mistake i read job again what will happen ?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 2:46 pm #5291
  
  DataFlair Team
  Spectator
  
  i have sqoop run and loaded data by mistake i read job again what will happen in sqoop?? and what will happen in hive data?
- September 20, 2018 at 2:47 pm #5292
  
  DataFlair Team
  Spectator
  
  As we are giving the same command again, it will be errored out with error “Output directory hdfs://localhost:9000/sqoop2 already exists” . We need a new directory in HDFS everytime we run the command as it’s output is generated by map-reduced job, which needs a new HDFS output directory everytime we run the job. Same apply for the same for Hive as well, if we load the data to hive from mysql using sqoop.
- September 20, 2018 at 2:47 pm #5293
  
  DataFlair Team
  Spectator
  
  If sqoop command for hive has been run 2nd time by mistake after successful loading of data in 1st run, then the command will fail for the while creating the hive table, but the data will be successfully loaded into the hdfs staging directory.
  
  So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging directory after copying(sucessfully) to actual hdfs location – it is last stage of sqoop job to clean up staging/tmp files – so if you try to list the tmp staging dir, you won’t find it.
  After successful import: hadoop staging directory will not be there.
  
  Sqoop can generate a hive table (using create-hive-tablecommand) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.
  
  It can be overcome by using hive-import command, as it automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

i have sqoop run and loaded data by mistake i read job again what will happen ?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses