How to overwrite an existing output file/dir during execution of MapReduce jobs?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to overwrite an existing output file/dir during execution of MapReduce jobs?

Viewing 1 reply thread
  • Author
    Posts
    • #4614
      DataFlair TeamDataFlair Team
      Spectator

      While submitting MapReduce job we need to supply a new output directory where MapReduce Job will write the output. But if the output directory already exists it throws an exception saying OutputDirectoryAlreadyExist. I don’t want to supply new directory each time while submitting MapReduce Job. How to configure MapReduce to overwrite existing output directory?

    • #4615
      DataFlair TeamDataFlair Team
      Spectator

      Below two steps to delete the output directory(not recommended) in MapReduce:
      1) using shell:
      bin/hadoop dfs -rmr /path/to/your/output/
      2) JAVA API:

      // configuration should contain reference to your namenode
      FileSystem fs = FileSystem.get(new Configuration());
      // true stands for recursively deleting the folder you gave
      fs.delete(new Path(”/path/to/your/output”), true);

      If you want to override the existing:
      Need to overwrite the Hadoop OutputFormat class:

      public class OverwriteOutputDirOutputFile extends TextOutputFormat{
      
      @Override
      public void checkOutputSpecs(FileSystem ignored, JobConf job)
      throws FileAlreadyExistsException,
      InvalidJobConfException, IOException {
      // Ensure that the output directory is set and not already there
      Path outDir = getOutputPath(job);
      if (outDir == null && job.getNumReduceTasks() != 0) {
      throw new InvalidJobConfException(”Output directory not set in JobConf.”);
      }
      if (outDir != null) {
      FileSystem fs = outDir.getFileSystem(job);
      // normalize the output directory
      outDir = fs.makeQualified(outDir);
      setOutputPath(job, outDir);
      
      // get delegation token for the outDir’s file system
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
      new Path[] {outDir}, job);
      
      // check its existence
      /* if (fs.exists(outDir)) {
      throw new FileAlreadyExistsException(”Output directory ” + outDir +
      ” already exists”);
      }*/
      }
      }
      }

      and need to set this as part of job configuration.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.