i have sqoop run and loaded data by mistake i read job again what will happen ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop i have sqoop run and loaded data by mistake i read job again what will happen ?

Viewing 2 reply threads
  • Author
    Posts
    • #5291
      DataFlair TeamDataFlair Team
      Spectator

      i have sqoop run and loaded data by mistake i read job again what will happen in sqoop?? and what will happen in hive data?

    • #5292
      DataFlair TeamDataFlair Team
      Spectator

      As we are giving the same command again, it will be errored out with error “Output directory hdfs://localhost:9000/sqoop2 already exists” . We need a new directory in HDFS everytime we run the command as it’s output is generated by map-reduced job, which needs a new HDFS output directory everytime we run the job. Same apply for the same for Hive as well, if we load the data to hive from mysql using sqoop.

    • #5293
      DataFlair TeamDataFlair Team
      Spectator

      If sqoop command for hive has been run 2nd time by mistake after successful loading of data in 1st run, then the command will fail for the while creating the hive table, but the data will be successfully loaded into the hdfs staging directory.

      So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging directory after copying(sucessfully) to actual hdfs location – it is last stage of sqoop job to clean up staging/tmp files – so if you try to list the tmp staging dir, you won’t find it.
      After successful import: hadoop staging directory will not be there.

      Sqoop can generate a hive table (using create-hive-tablecommand) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.

      It can be overcome by using hive-import command, as it automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.