while loading incremental data and we dont have such unique id

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop while loading incremental data and we dont have such unique id

Viewing 1 reply thread
  • Author
    Posts
    • #5284
      DataFlair TeamDataFlair Team
      Spectator

      while loading incremental data and we dont have such unique id in our db and we have like name then how to do incremental load?

    • #5287
      DataFlair TeamDataFlair Team
      Spectator

      For incremental load sqoop provides incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.
      Basically there are two types of incremental imports/ modes supported by sqoop append and lastmodified.
      The –incremental (mode) argument is used to specify the type of incremental import to perform. Below are the attributes:
      –check-column (col): Specifies the column to be examined when determining which rows to import.
      –last-value (value) : Specifies the maximum value of the check column from the previous import.
      For importing tables where new rows are continually being added with increasing row_id values we can specify append. Also we need to mention the column containing the row id(s) in the –check-column (col) attribute and also have to update the –last-value (value) with the maximum value of the check column from the previous import. Sqoop imports rows where the check column has a value greater than the value specified in the –last -value.
      But in-case there is no primary id in the updated table, sqoop offers an alternate update strategy as well, known as lastmodified mode. We can use it when the table rows are updated and we have a timestamp field to track the last updated/modified date. Rows where the check column holds a timestamp more recent than the timestamp specified with –last-value are imported.
      At the end of an incremental import, the value which should be specified as –last-value for a subsequent import is printed to the screen. When running a subsequent import, you should specify
      –last-value in this way to ensure you import only the new or updated data.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.