Explain the process of inter cluster data copying

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Explain the process of inter cluster data copying

Viewing 1 reply thread
  • Author
    Posts
    • #6235
      DataFlair TeamDataFlair Team
      Spectator

      Explain the process of inter cluster data copying in Hadoop?

    • #6237
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop provides HDFS Distributed File copy (distcp) tool for copying large amounts of HDFS files within or in between HDFS clusters. In the background process, distcp is implemented as a MapReduce job where mappers are only implemented for copying in parallel across the cluster. Using Distcp you can also copy files from multiple sources in to destination.

      Basic Syntax : hadoop distcp <SOURCE> <DESTINATION>

      It uses copy-listing-generator classes for creating the list of files/directories to be copied from source. The Input-formats and Map-Reduce components are responsible for the actual copy of files and directories from the source to the destination path. The listing-file created during copy-listing generation is consumed at this point, when the copy is carried out.

      Some of the frequently useful command options are listed below. All these are not mandatory but just optional.

      i) -atomic: This option is used to either commit all changes at a time or no changes should be committed. This makes sure that no partial copying is allowed. Either all files are copied entirely or no file is copied.
      ii) -overwrite: By default distcp will skip copying the files that already exist in the destination directory but these can be overwritten unconditionally with this option.
      iii) -update: If we need to copy only missing files or changed files, this options is very helpful and minimizes the copy time by copying only missing files/updated files instead of all the source files.
      iv) -m <arg> : This option lets user to specify the maximum number of mappers to be used.
      v) -delete: Deletes the existing files in the destination directory but not in source directory.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.