how you maintain its quality? and what is validation required?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop how you maintain its quality? and what is validation required?

Viewing 1 reply thread
  • Author
    Posts
    • #5269
      DataFlair TeamDataFlair Team
      Spectator

      if we loaded data then how you maintain its quality like as per business like how you are maintaining validation ; for validation purpose?

    • #5271
      DataFlair TeamDataFlair Team
      Spectator

      For validation we can use validation arguments as part of import/export arguments.
      The purpose is to Validate the data copied, either import or export by comparing the row counts from the source and the target post copy.
      Validation framework comes with default implementations but the interfaces can be extended to allow custom implementations by passing them as part of the command line arguments. It has limitations as Validation currently only validates data copied from a single table into HDFS and can’t be used for data imported into Hive or Hbase.

      Alternatively for Hive we can sqoop to HDFS from an external Hive table, then use the –validate feature and If it passes validation, we can load it to back to Hive tables.

      Of course there will be multiple ways to do the validation to overcome the limitations of –validate option in a better way, will surely update if I do find more on this.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.